In order to make surfing the internet faster, and to save redundant processing load with each req... more In order to make surfing the internet faster, and to save redundant processing load with each request for the same web page, many caching techniques have been developed to reduce latency of retrieving data on World Wide Web. In this paper we will give a quick overview of existing web caching techniques used for dynamic web pages then we will introduce a design and implementation model that take advantage of "URL Rewriting" feature in some popular web servers, e.g. Apache, to provide an effective approach of caching dynamic web pages.
International Journal of Computers and Their Applications, 2008
This paper presents a model for building a Semantic News Site (SNS) based on available legacy val... more This paper presents a model for building a Semantic News Site (SNS) based on available legacy valuable documents. It consists of four phases: extract the semantic structure for news site; build a unified semantic structure for the group of similar news sites; extract the metadata for each article in the site; and extract the knowledge from the raw text based on guiding ontology. This will enhance searching facilities, and will enable computers to understand and process documents. Consequently, this will save time consumed to extract valuable data from unstructured documents such as HTML documents. This work also, may be used to build the user profile (personalization) to extract only the required parts of the site that match the user interests. The proposed model is the first step towards the future semantic news sites. It has loosely coupled modules, and can be implemented with any distributed and heterogeneous technology such as component, agent, and web services.
Metamorphic virus recognition is the most challenging task for antivirus software, because such v... more Metamorphic virus recognition is the most challenging task for antivirus software, because such viruses are the hardest to detect as they change their appearance and structure on each new infection. In this study, the authors present an effective system for metamorphic virus recognition based on statistical machine learning techniques. The authors approach has successfully scored high detection rate for tested metamorphic virus classes and very low false-positive errors. The system is also able to learn new patterns of viruses for future recognition. The authors conclude the results of their simulation with results analysis and future enhancements in the system to detect other virus classes.
MILCOM 2017 - 2017 IEEE Military Communications Conference (MILCOM)
The large number of malicious files that are produced daily outpaces the current capacity of malw... more The large number of malicious files that are produced daily outpaces the current capacity of malware analysis and detection. For example, Intel Security Labs reported that during the second quarter of 2016, their system found more than 40M of new malware [1]. The damage of malware attacks is also increasingly devastating, as witnessed by the recent Cryptowall malware that has reportedly generated more than $325M in ransom payments to its perpetrators [2]. In terms of defense, it has been widely accepted that the traditional approach based on byte-string signatures is increasingly ineffective, especially for new malware samples and sophisticated variants of existing ones. New techniques are therefore needed for effective defense against malware. Motivated by this problem, the paper investigates a new defense technique against malware. The technique presented in this paper is utilized for automatic identification of malware packers that are used to obfuscate malware programs. Signatures of malware packers and obfuscators are extracted from the CFGs of malware samples. Unlike conventional byte signatures that can be evaded by simply modifying one or multiple bytes in malware samples, these signatures are more difficult to evade. For example, CFG-based signatures are shown to be resilient against instruction modifications and shuffling, as a single signature is sufficient for detecting mildly different versions of the same malware. Last but not least, the process for extracting CFG-based signatures is also made automatic.
Journal of Computer Virology and Hacking Techniques
Malware detection is still an open problem. There are numerous attacks that take place every day ... more Malware detection is still an open problem. There are numerous attacks that take place every day where malware is used to steal private information, disrupt services, or sabotage industrial systems. In this paper, we combine three kinds of contextual information, namely static, dynamic, and instruction-based, for malware detection. This leads to the definition of more than thirty thousand features, which is a large features set that covers a wide range of a sample characteristics. Through experiments with one million files, we show that this features set leads to machine learning based models that can detect both malware seen roughly at the time when the models are built, and malware first seen even months after the models were built (i.e., the detection models remain effective months ahead of time). This may be due to the comprehensiveness of the features set.
2014 IEEE Military Communications Conference, 2014
Every day thousands of malware are released online. The vast majority of these malware employ som... more Every day thousands of malware are released online. The vast majority of these malware employ some kind of obfuscation ranging from simple XOR encryption, to more sophisticated anti-analysis, packing and encryption techniques. Dynamic analysis methods can unpack the file and reveal its hidden code. However, these methods are very time consuming when compared to static analysis. Moreover, considering the large amount of new malware being produced daily, it is not practical to solely depend on dynamic analysis methods. Therefore, finding an effective way to filter the samples and delegate only obfuscated and suspicious ones to more rigorous tests would significantly improve the overall scanning process. Current techniques of identifying obfuscation rely mainly on signatures of known packers, file entropy score, or anomalies in file header. However, these features are not only easily bypass-able, but also do not cover all types of obfuscation. In this paper, we introduce a novel approach to identify obfuscated files based on anomalies in their instructions-based characteristics. We detect the presence of interleaving instructions which are the result of the opaque predicate anti-disassembly trick, and present distinguishing statistical properties based on the opcodes and control flow graphs of obfuscated files. Our detection system combines these features with other file structural features and leads to a very good result of detecting obfuscated malware.
In order to make surfing the internet faster, and to save redundant processing load with each req... more In order to make surfing the internet faster, and to save redundant processing load with each request for the same web page, many caching techniques have been developed to reduce latency of retrieving data on World Wide Web. In this paper we will give a quick overview of existing web caching techniques used for dynamic web pages then we will introduce a design and implementation model that take advantage of "URL Rewriting" feature in some popular web servers, e.g. Apache, to provide an effective approach of caching dynamic web pages.
International Journal of Computers and Their Applications, 2008
This paper presents a model for building a Semantic News Site (SNS) based on available legacy val... more This paper presents a model for building a Semantic News Site (SNS) based on available legacy valuable documents. It consists of four phases: extract the semantic structure for news site; build a unified semantic structure for the group of similar news sites; extract the metadata for each article in the site; and extract the knowledge from the raw text based on guiding ontology. This will enhance searching facilities, and will enable computers to understand and process documents. Consequently, this will save time consumed to extract valuable data from unstructured documents such as HTML documents. This work also, may be used to build the user profile (personalization) to extract only the required parts of the site that match the user interests. The proposed model is the first step towards the future semantic news sites. It has loosely coupled modules, and can be implemented with any distributed and heterogeneous technology such as component, agent, and web services.
Metamorphic virus recognition is the most challenging task for antivirus software, because such v... more Metamorphic virus recognition is the most challenging task for antivirus software, because such viruses are the hardest to detect as they change their appearance and structure on each new infection. In this study, the authors present an effective system for metamorphic virus recognition based on statistical machine learning techniques. The authors approach has successfully scored high detection rate for tested metamorphic virus classes and very low false-positive errors. The system is also able to learn new patterns of viruses for future recognition. The authors conclude the results of their simulation with results analysis and future enhancements in the system to detect other virus classes.
MILCOM 2017 - 2017 IEEE Military Communications Conference (MILCOM)
The large number of malicious files that are produced daily outpaces the current capacity of malw... more The large number of malicious files that are produced daily outpaces the current capacity of malware analysis and detection. For example, Intel Security Labs reported that during the second quarter of 2016, their system found more than 40M of new malware [1]. The damage of malware attacks is also increasingly devastating, as witnessed by the recent Cryptowall malware that has reportedly generated more than $325M in ransom payments to its perpetrators [2]. In terms of defense, it has been widely accepted that the traditional approach based on byte-string signatures is increasingly ineffective, especially for new malware samples and sophisticated variants of existing ones. New techniques are therefore needed for effective defense against malware. Motivated by this problem, the paper investigates a new defense technique against malware. The technique presented in this paper is utilized for automatic identification of malware packers that are used to obfuscate malware programs. Signatures of malware packers and obfuscators are extracted from the CFGs of malware samples. Unlike conventional byte signatures that can be evaded by simply modifying one or multiple bytes in malware samples, these signatures are more difficult to evade. For example, CFG-based signatures are shown to be resilient against instruction modifications and shuffling, as a single signature is sufficient for detecting mildly different versions of the same malware. Last but not least, the process for extracting CFG-based signatures is also made automatic.
Journal of Computer Virology and Hacking Techniques
Malware detection is still an open problem. There are numerous attacks that take place every day ... more Malware detection is still an open problem. There are numerous attacks that take place every day where malware is used to steal private information, disrupt services, or sabotage industrial systems. In this paper, we combine three kinds of contextual information, namely static, dynamic, and instruction-based, for malware detection. This leads to the definition of more than thirty thousand features, which is a large features set that covers a wide range of a sample characteristics. Through experiments with one million files, we show that this features set leads to machine learning based models that can detect both malware seen roughly at the time when the models are built, and malware first seen even months after the models were built (i.e., the detection models remain effective months ahead of time). This may be due to the comprehensiveness of the features set.
2014 IEEE Military Communications Conference, 2014
Every day thousands of malware are released online. The vast majority of these malware employ som... more Every day thousands of malware are released online. The vast majority of these malware employ some kind of obfuscation ranging from simple XOR encryption, to more sophisticated anti-analysis, packing and encryption techniques. Dynamic analysis methods can unpack the file and reveal its hidden code. However, these methods are very time consuming when compared to static analysis. Moreover, considering the large amount of new malware being produced daily, it is not practical to solely depend on dynamic analysis methods. Therefore, finding an effective way to filter the samples and delegate only obfuscated and suspicious ones to more rigorous tests would significantly improve the overall scanning process. Current techniques of identifying obfuscation rely mainly on signatures of known packers, file entropy score, or anomalies in file header. However, these features are not only easily bypass-able, but also do not cover all types of obfuscation. In this paper, we introduce a novel approach to identify obfuscated files based on anomalies in their instructions-based characteristics. We detect the presence of interleaving instructions which are the result of the opaque predicate anti-disassembly trick, and present distinguishing statistical properties based on the opcodes and control flow graphs of obfuscated files. Our detection system combines these features with other file structural features and leads to a very good result of detecting obfuscated malware.
Uploads
Papers by Moustafa Saleh