Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Security, Privacy, and Anonymity in Computation, Communication, and Storage
In this paper we propose a novel approach to identify anomalies in DNS traffic. The traffic time-points data is transformed to a string, which is used by new fast approximate string matching algorithm to detect anomalies. Our approach is generic in its nature and allows fast adaptation to different types of traffic. We evaluate the approach on a large public dataset of DNS traffic based on 10 days, discovering more than order of magnitude DNS attacks in comparison to auto-regression as a baseline. Moreover, the additional comparison has been made including other common regressors such as Linear Regression, Lasso, Random Forest and KNN, all of them showing the superiority of our approach.
Domain Name System (DNS) is one of the earliest vulnerable network protocols with various security gaps that have been exploited repeatedly over the last decades. DNS abuse is one of the most challenging threats for cybersecurity specialists. However, providing secure DNS is still a big challenging mission as attackers use complicated methodologies to inject malicious code in DNS inquiries. Many researchers have explored different machine learning (ML) techniques to encounter this challenge. However, there are still several challenges and barriers to utilizing ML. This paper introduces a systematic approach for identifying malicious and encrypted DNS queries by examining the network traffic and deriving statistical characteristics. Afterward, implementing several ML methods:
IEEE Communications Surveys & Tutorials, 2018
Despite the ubiquitous role of domain name system (DNS) in sustaining the operations of various Internet services (domain name to IP address resolution, email, Web), DNS was abused/misused to perform large-scale attacks that affected millions of Internet users. To detect and prevent threats associated to DNS, researchers introduced passive DNS replication and analysis as an effective alternative approach for analyzing live DNS traffic. In this paper, we survey state of the art systems that utilized passive DNS traffic for the purpose of detecting malicious behaviors on the Internet. We highlight the main strengths and weaknesses of the implemented systems through an in-depth analysis of the detection approach, collected data, and detection outcomes. We highlight an incremental implementation pattern in the studied systems with similarities in terms of the used datasets and detection approach. Furthermore, we show that almost all studied systems implemented supervised machine learning (SML), which has its own limitations. In addition, while all surveyed systems required several hours or even days before detecting threats, we illustrate the ability to enhance performance by implementing a system prototype that utilize big data analytics frameworks to detect threats in near real-time. We demonstrate the feasibility of our threat detection prototype through real-life examples, and provide further insights for future work toward analyzing DNS traffic in near real-time.
The Domain Name System (DNS) is one of the critical components of modern Internet networking. Proper Internet functions (such as mail delivery, web browsing and so on) are typically not possible without the use of DNS. However with the growth and commercialization of global networking, this protocol is often abused for malicious purposes which negatively impacts the security of Internet users. In this paper we perform security data analysis of DNS traffic at large scale for a prolonged period of time. In order to do this, we developed DNSPacketlizer, a DNS traffic analysis tool and deployed it at a mid-scale Internet Service Provider (ISP) for a period of six months. The findings presented in this paper demonstrate persistent abuse of the protocol by Botnet herders and antivirus software vendors for covert communication. Other suspicious or potentially malicious activities in DNS traffic are also discussed.
2015 IEEE Conference on Communications and Network Security (CNS), 2015
DNS has been increasingly abused by adversaries for cyber-attacks. Recent research has leveraged DNS failures (i.e. DNS queries that result in a Non-Existent-Domain response from the server) to identify malware activities, especially domainflux botnets that generate many random domains as a rendezvous technique for command-&-control. Using ISP network traces, we conduct a systematic analysis of DNS failure characteristics, with the goal of uncovering how attackers exploit DNS for malicious activities. In addition to DNS failures generated by domain-flux bots, we discover many diverse and stealthy failure patterns that have received little attention. Based on these findings, we present a framework that detects diverse clusters of suspicious domain names that cause DNS failures, by considering multiple types of syntactic as well as temporal patterns. Our evolutionary learning framework evaluates the clusters produced over time to eliminate spurious cases while retaining sustaining (i.e., highly suspicious) clusters. One of the advantages of our framework is in analyzing DNS failures on per-client basis and not hinging on the existence of multiple clients infected by the same malware. Our evaluation on a large ISP network trace shows that our framework detects at least 97% of the clients with suspicious DNS behaviors, with over 81% precision.
Sensors, 2016
The Domain Name System (DNS) is a critical infrastructure of any network, and, not surprisingly a common target of cybercrime. There are numerous works that analyse higher level DNS traffic to detect anomalies in the DNS or any other network service. By contrast, few efforts have been made to study and protect the recursive DNS level. In this paper, we introduce a novel abstraction of the recursive DNS traffic to detect a flooding attack, a kind of Distributed Denial of Service (DDoS). The crux of our abstraction lies on a simple observation: Recursive DNS queries, from IP addresses to domain names, form social groups; hence, a DDoS attack should result in drastic changes on DNS social structure. We have built an anomaly-based detection mechanism, which, given a time window of DNS usage, makes use of features that attempt to capture the DNS social structure, including a heuristic that estimates group composition. Our detection mechanism has been successfully validated (in a simulated and controlled setting) and with it the suitability of our abstraction to detect flooding attacks. To the best of our knowledge, this is the first time that work is successful in using this abstraction to detect these kinds of attacks at the recursive level. Before concluding the paper, we motivate further research directions considering this new abstraction, so we have designed and tested two additional experiments which exhibit promising results to detect other types of anomalies in recursive DNS servers.
IEEE/ACM Transactions on Networking, 2012
Recent Botnets such as Conficker, Kraken and Torpig have used DNS based "domain fluxing" for command-and-control, where each Bot queries for existence of a series of domain names and the owner has to register only one such domain name. In this paper, we develop a methodology to detect such "domain fluxes" in DNS traffic by looking for patterns inherent to domain names that are generated algorithmically, in contrast to those generated by humans. In particular, we look at distribution of alphanumeric characters as well as bigrams in all domains that are mapped to the same set of IP-addresses. We present and compare the performance of several distance metrics, including KL-distance, Edit distance and Jaccard measure. We train by using a good data set of domains obtained via a crawl of domains mapped to all IPv4 address space and modeling bad data sets based on behaviors seen so far and expected. We also apply our methodology to packet traces collected at a Tier-1 ISP and show we can automatically detect domain fluxing as used by Conficker botnet with minimal false positives, in addition to discovering a new botnet within the ISP trace. We also analyze a campus DNS trace to detect another unknown botnet exhibiting advanced domain name generation technique.
World Academy of Research in Science and Engineering , 2020
Currently, cyber-attacks have increased rapidly in both the number of attacks and the extent of their damage to organizations and businesses. In particular, cyber-attack techniques based on user-side vulnerabilities are developing very strongly. One of the methods that are commonly used by attackers is distributing malicious domains into users' machines. Because of the serious consequences of the distribution of malicious domains, the problem of early detection of malicious domains is very necessary today. In this paper, we propose a method of detecting malicious domains based on the connection behavior analysis technique using machine learning algorithms. The difference between our research and other studies is shown in looking for and extracting features that accurately represent the behavior of malicious domains and normal domains. Besides, in order to classify the normal domain and malicious domain, we select Random Forest (RF) supervised learning algorithms. In the experimental results, we change the parameters of the RF algorithm to seek the most optimal parameter for the algorithm when applying them to the problem of detecting malicious domains.
Future Internet, 2018
In recent years, botnets have become one of the major threats to information security because they have been constantly evolving in both size and sophistication. A number of botnet detection measures, such as honeynet-based and Intrusion Detection System (IDS)-based, have been proposed. However, IDS-based solutions that use signatures seem to be ineffective because recent botnets are equipped with sophisticated code update and evasion techniques. A number of studies have shown that abnormal botnet detection methods are more effective than signature-based methods because anomaly-based botnet detection methods do not require pre-built botnet signatures and hence they have the capability to detect new or unknown botnets. In this direction, this paper proposes a botnet detection model based on machine learning using Domain Name Service query data and evaluates its effectiveness using popular machine learning techniques. Experimental results show that machine learning algorithms can be used effectively in botnet detection and the random forest algorithm produces the best overall detection accuracy of over 90%.
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM, 2013
The performance and operational characteristics of the DNS protocol are of deep interest to the research and network operations community. In this paper, we present measurement results from a unique dataset containing more than 26 billion DNS query-response pairs collected from more than 600 globally distributed recursive DNS resolvers. We use this dataset to reaffirm findings in published work and notice some significant differences that could be attributed both to the evolving nature of DNS traffic and to our differing perspective. For example, we find that although characteristics of DNS traffic vary greatly across networks, the resolvers within an organization tend to exhibit similar behavior. We further find that more than 50% of DNS queries issued to root servers do not return successful answers, and that the primary cause of lookup failures at root servers is malformed queries with invalid TLDs. Furthermore, we propose a novel approach that detects malicious domain groups using temporal correlation in DNS queries. Our approach requires no comprehensive labeled training set, which can be difficult to build in practice. Instead, it uses a known malicious domain as anchor, and identifies the set of previously unknown malicious domains that are related to the anchor domain. Experimental results illustrate the viability of this approach, i.e. , we attain a true positive rate of more than 96%, and each malicious anchor domain results in a malware domain group with more than 53 previously unknown malicious domains on average.
2020
This paper presents a systematic two-layer approach for detecting DNS over HTTPS (DoH) traffic and distinguishing Benign-DoH traffic from Malicious-DoH traffic using six machine learning algorithms. The capability of machine learning classifiers is evaluated considering their accuracy, precision, recall, and F-score, confusion matrices, ROC curves, and feature importance. The results show that LGBM and XGBoost algorithms outperform the other algorithms in almost all the classification metrics reaching the maximum accuracy of 100% in the classification tasks of layers 1 and 2. LGBM algorithms only misclassified one DoH traffic test as non-DoH out of 4000 test datasets. It has also found that out of 34 features extracted from the CIRA-CIC-DoHBrw-2020 dataset, SourceIP is the critical feature for classifying DoH traffic from non-DoH traffic in layer one followed by DestinationIP feature. However, only DestinationIP is an important feature for LGBM and gradient boosting algorithms when classifying Benign-DoH from Malicious-DoH traffic in layer 2.
IEEE/ACM Transactions on Networking, 2017
Network-wide activity is when one computer (the originator) touches many others (the targets). Motives for activity may be benign (mailing lists, content-delivery networks, and research scanning), malicious (spammers and scanners for security vulnerabilities), or perhaps indeterminate (ad trackers). Knowledge of malicious activity may help anticipate attacks, and understanding benign activity may set a baseline or characterize growth. This paper identifies domain name system (DNS) backscatter as a new source of information about networkwide activity. Backscatter is the reverse DNS queries caused when targets or middleboxes automatically look up the domain name of the originator. Queries are visible to the authoritative DNS servers that handle reverse DNS. While the fraction of backscatter they see depends on the server's location in the DNS hierarchy, we show that activity that touches many targets appear even in sampled observations. We use information about the queriers to classify originator activity using machine-learning. Our algorithm has reasonable accuracy and precision (70-80%) as shown by data from three different organizations operating DNS servers at the root or country level. Using this technique, we examine nine months of activity from one authority to identify trends in scanning, identifying bursts corresponding to Heartbleed, and broad and continuous scanning of secure shell.
J. Internet Serv. Inf. Secur., 2015
DNS protocol is critically important for secure network operations. All networked applications request DNS protocol to translate the network domain names to correct IP addresses. The DNS protocol is prone to attacks like cache poisoning attacks and DNS hijacking attacks that can lead to compromising user’s accounts and stored information. In this paper, we present an anomaly based Intrusion Detection System (IDS) for the DNS protocol (DNS-IDS) that models the normal operations of the DNS protocol and accurately detects any abnormal behavior or exploitation of the protocol. The DNS-IDS system operates in two phases, the training phase and the operational phase. In the training phase, the normal behavior of the DNS protocol is modeled as a finite state machine where we derive the temporal statistics of normal DNS traffic. Then we develop an anomaly metric for the DNS protocol that is a function of the temporal statistics for both the normal and abnormal transitions of the DNS protocol...
2014 6th International Conference on New Technologies, Mobility and Security (NTMS), 2014
This work proposes a novel approach to infer and characterize Internet-scale DNS amplification DDoS attacks by leveraging the darknet space. Complementary to the pioneer work on inferring Distributed Denial of Service (DDoS) activities using darknet, this work shows that we can extract DDoS activities without relying on backscattered analysis. The aim of this work is to extract cyber security intelligence related to DNS Amplification DDoS activities such as detection period, attack duration, intensity, packet size, rate and geo-location in addition to various network-layer and flow-based insights. To achieve this task, the proposed approach exploits certain DDoS parameters to detect the attacks. We empirically evaluate the proposed approach using 720 GB of real darknet data collected from a /13 address space during a recent three months period. Our analysis reveals that the approach was successful in inferring significant DNS amplification DDoS activities including the recent prominent attack that targeted one of the largest anti-spam organizations. Moreover, the analysis disclosed the mechanism of such DNS amplification DDoS attacks. Further, the results uncover high-speed and stealthy attempts that were never previously documented. The case study of the largest DDoS attack in history lead to a better understanding of the nature and scale of this threat and can generate inferences that could contribute in detecting, preventing, assessing, mitigating and even attributing of DNS amplification DDoS activities.
2017
DNS tunneling is one of the issues that have concerned the information security community in the last decade. Such malicious activity resembles a legitimate threat for many corporations where there are a respected amount of network traffic that would be embedded with DNS tunneling. The threats that caused by such tunneling could be ranged from the full remote control into file transfer or even a full IP tunnel. Therefore, different approaches have been proposed for detecting the DNS tunneling such firewalls and intrusion detection systems. However, these approaches are limited to specific types of tunneling. Therefore, researchers have tended to utilize machine learning techniques due to its ability to analyze and predict the occurrence of DNS tunneling. Nonetheless, there are plenty of choices for employing specific machine learning techniques. This paper aims to provide a comparative study for three machine learning techniques including SVM, NB and J48. A benchmark dataset for the...
International Journal of Information Security
Domain names are at the base of today's cyber-attacks. Attackers abuse the domain name system (DNS) to mystify their attack ecosystems; they systematically generate a huge volume of distinct domain names to make it infeasible for blacklisting approaches to keep up with newly generated malicious domain names. To solve this problem, we propose DomainProfiler for discovering malicious domain names that are likely to be abused in future. The key idea with our system is to exploit temporal variation patterns (TVPs) of domain names. The TVPs of domain names include information about how and when a domain name has been listed in legitimate/popular and/or malicious domain name lists. On the basis of this idea, our system actively collects historical DNS logs, analyzes their TVPs, and predicts whether a given domain name will be used for malicious purposes. Our evaluation revealed that DomainProfiler can predict malicious domain names 220 days beforehand with a true positive rate of 0.985. Moreover, we verified the effectiveness of our system in terms of the benefits from our TVPs and defense against cyber-attacks. Keywords Network-level security and protection • Domain name • DNS • Malware • Temporal variation pattern This paper is the extended version of the paper presented at IEEE/IFIP DSN 2016 [15].
Journal of Intelligent & Fuzzy Systems, 2017
Anomalous traffics are those unusual and colossal hits a non-popular domain gets for a small epoch period in a day. Regardless of whether these anomalies are malicious or not, it is important to analyze them as they might have a dramatic impact on a customer or an end user. Identifying these traffic anomalies is a challenge, as it requires mining and identifying patterns among huge volume of data. In this paper, we provide a statistical and dynamic reputation based approach to identify unpopular domains receiving huge volumes of traffic within a short period of time. Our aim is to develop and deploy a lightweight framework in a monitored network capable of analyzing DNS traffic and provide early warning alerts regarding domains receiving unusual hits to reduce the collateral damage faced by an end-user or customer. The authors have employed statistical analysis, supervised learning and ensemble based dynamic reputation of domains, IP addresses and name servers to distinguish benign and abnormal domains with very low false positives.
Electronics
The domain name system (DNS) plays a vital role in network services for name resolution. By default, this service is seldom blocked by security solutions. Thus, it has been exploited for security breaches using the DNS covert channel (tunnel). One of the greatest current data leakage techniques is DNS tunneling, which uses DNS packets to exfiltrate sensitive and confidential data. Data protection against stealthy exfiltration attacks is critical for human beings and organizations. As a result, many security techniques have been proposed to address exfiltration attacks starting with building security policies and ending with designing security solutions, such as firewalls, intrusion detection or prevention, and others. In this paper, a hybrid DNS tunneling detection system has been proposed based on the packet length and selected features for the network traffic. The proposed system takes advantage of the outcome results conducted using the testbed and Tabu-PIO feature selection algo...
Arxiv preprint arXiv: …, 2009
The botnet is considered as a critical issue of the Internet due to its fast growing mechanism and affect. Recently, Botnets have utilized the DNS and query DNS server just like any legitimate hosts. In this case, it is difficult to distinguish between the legitimate DNS traffic and illegitimate DNS traffic. It is important to build a suitable solution for botnet detection in the DNS traffic and consequently protect the network from the malicious Botnets activities. In this paper, a simple mechanism is proposed to monitors the DNS traffic and detects the abnormal DNS traffic issued by the botnet based on the fact that botnets appear as a group of hosts periodically. The proposed mechanism is also able to classify the DNS traffic requested by group of hosts (group behavior) and single hosts (individual behavior), consequently detect the abnormal domain name issued by the malicious Botnets. Finally, the experimental results proved that the proposed mechanism is robust and able to classify DNS traffic, and efficiently detects the botnet activity with average detection rate of 89%.
2018 IEEE Global Communications Conference (GLOBECOM), 2018
Domain Name System (DNS) is a crucial component of current IP-based networks as it is the standard mechanism for name to IP resolution. However, due to its lack of data integrity and origin authentication processes, it is vulnerable to a variety of attacks. One such attack is Typosquatting. Detecting this attack is particularly important as it can be a threat to corporate secrets and can be used to steal information or commit fraud. In this paper, a machine learning-based approach is proposed to tackle the typosquatting vulnerability. To that end, exploratory data analytics is first used to better understand the trends observed in eight domain name-based extracted features. Furthermore, a majority voting-based ensemble learning classifier built using five classification algorithms is proposed that can detect suspicious domains with high accuracy. Moreover, the observed trends are validated by studying the same features in an unlabeled dataset using K-means clustering algorithm and through applying the developed ensemble learning classifier. Results show that legitimate domains have a smaller domain name length and fewer unique characters. Moreover, the developed ensemble learning classifier performs better in terms of accuracy, precision, and F-score. Furthermore, it is shown that similar trends are observed when clustering is used. However, the number of domains identified as potentially suspicious is high. Hence, the ensemble learning classifier is applied with results showing that the number of domains identified as potentially suspicious is reduced by almost a factor of five while still maintaining the same trends in terms of features' statistics.
2018
The Domain Name System (DNS) is an essential network service translating human-friendly host names into numerical IP addresses. Prior to almost any network communication, a communication with a DNS server is, the most likely, needed. For this reason, DNS cyber-attacks are now one of the most challenging threats in the information security community due to its wide availability and the fact that it’s not monitored in terms of security not intended for data transfer. Particularly, DNS tunnelling embedding data in DNS queries and response is receiving a lot of attention in the research field over the last years. Recent studies have focused on DNS tunnelling detection using machine learning. The aim of this paper is to provide a comprehensive survey of some different techniques proposed recently in the literature for detecting DNS tunnels using machine learning, while highlighting on the main findings and comparing their obtained results. Keywords— Domain Name System, Cyber-attacks, Tun...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.