Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010, 2010 10th International Conference on Hybrid Intelligent Systems
This paper examines the performance of a new Hidden Markov Model (HMM) structure used as the core of an Internet traffic classsifier and compares the results against other models present in the literature. Traffic modeling and classification find importance in many areas such as bandwidth management, traffic analysis, prediction and engineering, network planning, Quality of Service provisioning and anomalous traffic detection. The new HMM structure, which takes into account the packet payload size (PS) and the inter-packet times (IPT) sequences, is obtained by concatenation of a first part which is framed with a HMM profile with another part whose structure is that of a fully-connected HMM. The first part captures the specific properties of the initial protocol packets while the second part captures the statistical properties of the whole sequence present in the flow. Models generated are found to increase the accurate in classifying different traffic classes in the analysed dataset. The average accuracy obtained by the classifier is 62.5% having seen only five packets, 80.0% after examining 13 packets and 95.5% after seeing the unidirectional entire flow.
Global …, 2008
Traffic classification and identification is a fertile research area. Beyond Quality of Service, service differentiation, and billing, one of the most important applications of traffic classification is in the field of network security. This paper proposes a packet-level traffic classification approach based on Hidden Markov Model (HMM). Classification is performed by using real network traffic and estimating -in a combined fashion -Packet Size (PS) and Inter Packet Time (IPT) characteristics, thus remaining applicable to encrypted traffic too. The effectiveness of the proposed approach is evaluated by considering several traffic typologies: we applied our model to real traffic traces of Age of Mythology and Counter Strike (two Multi Player Network Games), HTTP, SMTP, Edonkey, PPlive (a peer-to-peer IPTV application), and MSN Messenger. An analytical basis and the mathematical details regarding the model are given. Results show how the proposed approach is able to classify network traffic by using packet-level statistical properties and therefore it is a good candidate as a component for a multi-classification framework.
Computer Networks, 2008
Responsible Editor: M. Smirnow a b s t r a c t
2011 International Conference on Multimedia Computing and Systems, 2011
In the last decade, traffic classification has been the most interesting field for the majority of network searchers, and this behavior is caused by its importance, so many successful results have already improved reached. In this way, we present our method of packet classification which is based on the Hidden Markov Model (HMM); as an HMM states we are interested to the Flags bits of a TCP packet, and as an HMM observations we were convinced that we should focus on a packet property that didn't depend on the payload [7] information to avoid the encryption problem, so the right alternative was the distribution of the packet size for each state of our HMM. This process has been verified and validated by several applications like: Steaming video over HTTP (youtube.com, dailymotion.com, justin.tv), peer-to-peer video streaming, File transfer protocol (FTP). Based on the related woks in this field, our method has given us convenient results comparing to old works.
This paper presents an improved variant of our Markov-based TCP traffic classifier and demonstrates its performance using traffic captured in a university network. Payload length, flow direction, and position of the first data packets of a TCP connection are reflected in the states of the Markov models. In addition, we integrate a new "end of connection" state to further improve the classification accuracy. Using 10-fold cross validation, we identify appropriate settings for the payload length intervals and the number of data packets considered in the models. Finally, we discuss the classification results for the different applications.
Proc. of IEEE …, 2006
Traffic modeling is a fertile research area. This paper proposes a packet-level traffic model of traffic sources based on Hidden Markov Model. It has been developed by using real network traffic and estimating in a combined fashion Packet Size and Inter Packet Time. The effectiveness of the proposed model is evaluated by studying several traffic types with strong differences in terms of both applications/users and protocol behavior. Indeed, we applied our model to real traffic traces of Age of Mythology (a Multi Player Network Game), SMTP, and HTTP. An analytical basis and the mathematical details regarding the model are given. Results show how the proposed model captures first-order statistics, as well as temporal dynamics via auto-and cross-correlation. Also, the capability to accurately replicate the considered traffic sources is shown. Finally, preliminary results for model-based traffic prediction reveal encouraging.
IEEE Globecom 2006, 2006
Traffic modeling is a fertile research area. This paper proposes a packet-level traffic model of traffic sources based on Hidden Markov Model. It has been developed by using real network traffic and estimating in a combined fashion Packet Size and Inter Packet Time. The effectiveness of the proposed model is evaluated by studying several traffic types with strong differences in terms of both applications/users and protocol behavior. Indeed, we applied our model to real traffic traces of Age of Mythology (a Multi Player Network Game), SMTP, and HTTP. An analytical basis and the mathematical details regarding the model are given. Results show how the proposed model captures first-order statistics, as well as temporal dynamics via auto-and cross-correlation. Also, the capability to accurately replicate the considered traffic sources is shown. Finally, preliminary results for model-based traffic prediction reveal encouraging.
Internet traffic classification plays an important role for network management. In fact, operators need to better predict future traffic behavior to identify anomalous situations. We present here an approach for traffic classification using Hidden Naive Bayes model and a supervised discretization scheme. This approach can achieve an appropriate performance on a range of application types with accessing only the information that remains unchanged after encryption. At first, we use a supervised method based on idea behind Holte's 1R algorithm for discretization of continuous features derived from packet headers. Then, in order to assign flows to their respective classes, we utilize Hidden Naive Bayes (HNB) model. Finally, we test our scheme using a subset of two data sets and compare it to Tree-Augmented Naive Bayes (TAN) algorithm. Various performance measures namely Accuracy (Auc) and Trust are used for quantitative analysis of our results. Experimental results reveal that our proposed modeling approach based on HNB not only achieves a higher performance in terms of both measures in comparison to TAN algorithm but also learns very well even with a small number of training flows.
International Journal of Recent Technology and Engineering (IJRTE), 2019
The new development in the architecture of Internet has increased internet traffic. The introduction of Peer to Peer (P2P) applications are affecting the performance of traditional internet applications. Network optimization is used to monitor and manage the internet traffic and improve the performance of internet applications. The existing optimizations methods are not able to provide better management for networks. Machine learning (ML) is one of the familiar techniques to handle the internet traffic. It is used to identify and reduce the traffic. The lack of relevant datasets have reduced the performance of ML techniques in classification of internet traffic. The aim of the research is to develop a hybrid classifier to classify the internet traffic data and mitigate the traffic. The proposed method is deployed in the classification of traffic traces of University Technology Malaysia. The method has produced an accuracy of 98.3% with less computation time.
International Journal of Engineering Research and, 2015
Traffic Classification is a method of categorizing the computer network traffic based on various features observed passively in the traffic into a number of traffic classes. Due to the rapid increase of different Internet application behaviors', raised the need to disguise the applications for filtering, accounting, advertising, network designing etc. Many traditional methods like port based, packets based and some alternate methods based on machine learning approaches have been used for the classification process. Proposed a new traffic classification scheme to utilize the information among the correlated traffic flows generated by an application. Discretized statistical features are extracted and are used to represent the traffic flows. The removal of irrelevant and redundant features from the feature set is done by Correlation based feature selection with high class-specific correlation and low inter correlation. For the classification process Naïve Bayes with Discretization is used. The proposed scheme is compared with three other Bayesian models. The experimental evaluation show that NBD outperforms the other methods even in the case of a small supervised training samples.
ICTACT Journal on Communication Technology
Real time internet traffic classification is imperative for service discrimination, network security and network monitoring. Classification of traffic depends on initial first few network packets of full flows of captured IP traffic. Practically, the real world framework situation expects correct conclusion of classification well before a flow has ended even if the start of the Traffic flow is missed. This is achieved by calculating features from few N network packets, taken at any random time instant at any random point in the duration of flow. This research proposes a novel parameter Relative Uncertainty (RU) to estimate the level of diversity of internet traffic and can then be used for characterization of internet traffic. Small sub-flows from Full-flows are selected based on minimum RU value (MRUB-SFs: Minimum RU Based Sub Flows), and then features are calculated for training the C4.5 ML classifier. Experimentation is carried out with various standard datasets and results stable accuracy of 99.3167% for different classes of applications.
… of the 3rd annual ACM workshop …, 2007
IEEE Communications Surveys and Tutorials, 2008
The research community has begun looking for IP traffic classification techniques that do not rely on 'well known' TCP or UDP port numbers, or interpreting the contents of packet payloads. New work is emerging on the use of statistical traffic characteristics to assist in the identification and classification process. This survey paper looks at emerging research into the application of Machine Learning (ML) techniques to IP traffic classification -an inter-disciplinary blend of IP networking and data mining techniques. We provide context and motivation for the application of ML techniques to IP traffic classification, and review 18 significant works that cover the dominant period from 2004 to early 2007. These works are categorized and reviewed according to their choice of ML strategies and primary contributions to the literature. We also discuss a number of key requirements for the employment of ML-based traffic classifiers in operational IP networks, and qualitatively critique the extent to which the reviewed works meet these requirements. Open issues and challenges in the field are also discussed.
Journal of Computer Networks and Communications, 2016
Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application throughk-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected tok-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average acc...
International Journal of Information and Communication Technology Research, 2022
Almost every industry has revolutionized with Artificial Intelligence. The telecommunication industry is one of them to improve customers' Quality of Services and Quality of Experience by enhancing networking infrastructure capabilities which could lead to much higher rates even in 5G Networks. To this end, network traffic classification methods for identifying and classifying user behavior have been used. Traditional analysis with Statistical-Based, Port-Based, Payload-Based, and Flow-Based methods was the key for these systems before the 4th industrial revolution. AI combination with such methods leads to higher accuracy and better performance. In the last few decades, numerous studies have been conducted on Machine Learning and Deep Learning, but there are still some doubts about using DL over ML or vice versa. This paper endeavors to investigate challenges in ML/DL use-cases by exploring more than 140 identical researches. We then analyze the results and visualize a practical way of classifying internet traffic for popular applications.
Springer eBooks, 2008
We address the problem of classifying Internet packet flows according to the application level protocol that generated them. Unlike deep packet inspection, which reads up to application layer payloads and keeps track of packet sequences, we consider classification based on statistical features extracted in real time from the packet flow, namely IP packet lengths and inter-arrival times. A statistical classification algorithm is proposed, built upon the powerful and rich tools of cluster analysis. By exploiting traffic traces taken at the Networking Lab of our Department and traces from CAIDA, we defined data sets made up of thousands of flows for up to five different application protocols. With the classic approach of training and test data sets we show that cluster analysis yields very good results in spite of the little information it is based on, to stick to the real time decision requirement. We aim to show that the investigated applications are characterized from a "signature" at the network layer that can be useful to recognize such applications even when the port number is not significant. Numerical results are presented to highlight the effect of major algorithm parameters. We discuss complexity and possible exploitation of the statistical classifier.
— In this paper, Automated system is built which contains processing of captured packets from the network. Machine learning algorithms are used to build a traffic classifier which will classify the packets as malicious or non-malicious. Previously, many traditional ways were used to classify the network packets using tools, but this approach contains machine learning approach, which is an open field to explore and has provided outstanding results till now. The main aim is to perform traffic monitoring, analyze it and govern the intruders. The CTU-13 is a dataset of botnet traffic which is used to develop traffic classification system based on the features of the captured packets on the network. This type of classification will assist the IT administrators to determine the unknown attacks which are broadening in the IT industry.
2013 Proceedings IEEE INFOCOM, 2013
Network visibility is a critical part of traffic engineering, network management, and security. Recently, unsupervised algorithms have been envisioned as a viable alternative to automatically identify classes of traffic. However, the accuracy achieved so far does not allow to use them for traffic classification in practical scenario. In this paper, we propose SeLeCT, a Self-Learning Classifier for Internet traffic. It uses unsupervised algorithms along with an adaptive learning approach to automatically let classes of traffic emerge, being identified and (easily) labeled. SeLeCT automatically groups flows into pure (or homogeneous) clusters using alternating simple clustering and filtering phases to remove outliers. SeLeCT uses an adaptive learning approach to boost its ability to spot new protocols and applications. Finally, SeLeCT also simplifies label assignment (which is still based on some manual intervention) so that proper class labels can be easily discovered. We evaluate the performance of SeLeCT using traffic traces collected in different years from various ISPs located in 3 different continents. Our experiments show that SeLeCT achieves overall accuracy close to 98%. Unlike state-of-art classifiers, the biggest advantage of SeLeCT is its ability to help discovering new protocols and applications in an almost automated fashion.
International Journal of Parallel Programming, 2015
Identifying network traffic at their early stages accurately is very important for the application of traffic identification. In recent years, more and more studies have tried to build effective machine learning models to identify traffic with the few packets at the early stage. Packet sizes and statistical features have been proved to be effective features which are widely used in early stage traffic identification. However, an important issue is still unconcerned, that is whether there exists essential effectiveness differences between the two kinds of features. In this paper, we set out to evaluate the effectiveness of statistical features in comparing with packet sizes. We firstly extract the packet sizes and their statistical features of the first six packets on three traffic data sets. Then the mutual information between each feature and the corresponding traffic type label is computed to show the effectiveness of the feature. And then we execute crossover identification experiments with different feature sets using ten well-known machine learning classifiers. Our experimental results show that most classifiers get almost the same performances using packet sizes and statistical features for early stage traffic identification. And most classifiers can achieve high identification accuracies using only two statistical features.
2010 2nd International Workshop on Intelligent Systems and Applications, 2010
... network problems such as traffic management for Internet Service Providers (ISPs), intrusiondetection systems, Denial of Service (DoS) attack detection and automatically ... network trace [6]. Traffic classification has been a challenging problem as network applications are ...
The continual growth of high speed networks is a challenge for real-time network analysis systems. The real time traffic classification is an issue for corporations and ISPs (Internet Service Providers). This work presents the design and implementation of a real time flow-based network traffic classification system. The classifier monitor acts as a pipeline consisting of three modules: packet capture and pre-processing, flow reassembly, and classification with Machine Learning (ML). The modules are built as concurrent processes with well defined data interfaces between them so that any module can be improved and updated independently. In this pipeline, the flow reassembly function becomes the bottleneck of the performance. In this implementation, was used a efficient method of reassembly which results in a average delivery delay of 0.49 seconds, approximately. For the classification module, the performances of the K-Nearest Neighbor (KNN), C4.5 Decision Tree, Naive Bayes (NB), Flexible Naive Bayes (FNB) and AdaBoost Ensemble Learning Algorithm are compared in order to validate our approach.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.