The research community has begun looking for IP traffic classification techniques that do not rel... more The research community has begun looking for IP traffic classification techniques that do not rely on 'well known' TCP or UDP port numbers, or interpreting the contents of packet payloads. New work is emerging on the use of statistical traffic characteristics to assist in the identification and classification process. This survey paper looks at emerging research into the application of Machine Learning (ML) techniques to IP traffic classification -an inter-disciplinary blend of IP networking and data mining techniques. We provide context and motivation for the application of ML techniques to IP traffic classification, and review 18 significant works that cover the dominant period from 2004 to early 2007. These works are categorized and reviewed according to their choice of ML strategies and primary contributions to the literature. We also discuss a number of key requirements for the employment of ML-based traffic classifiers in operational IP networks, and qualitatively critique the extent to which the reviewed works meet these requirements. Open issues and challenges in the field are also discussed.
Literature on the use of machine learning (ML) algorithms for classifying IP traffic has relied o... more Literature on the use of machine learning (ML) algorithms for classifying IP traffic has relied on fullflows or the first few packets of flows. In contrast, many real-world scenarios require a classification decision well before a flow has finished even if the flow's beginning is lost. This implies classification must be achieved using statistics derived from the most recent N packets taken at any arbitrary point in a flow's lifetime. We propose training the classifier on a combination of short sub-flows (extracted from fullflow examples of the target application's traffic). We demonstrate this optimisation using the Naïve Bayes ML algorithm, and show that our approach results in excellent performance even when classification is initiated mid-way through a flow with windows as small as 25 packets long. We suggest future use of unsupervised ML algorithms to identify optimal subflows for training.
Automated Traffic Classification and Application Identification using Machine Learning
The dynamic classification and identification of network applications responsible for network tra... more The dynamic classification and identification of network applications responsible for network traffic flows offers substantial benefits to a number of key areas in IP network engineering, management and surveillance. Currently such classifications rely on selected packet header fields (eg ...
A number of key areas in IP network engineering, management and surveillance greatly benefit from... more A number of key areas in IP network engineering, management and surveillance greatly benefit from the ability to dynamically identify traffic flows according to the applications responsible for their creation. Currently such classifications rely on selected packet header fields (e.g. destination port) or application layer protocol decoding. These methods have a number of shortfalls e.g. many applications can use unpredictable port numbers and protocol decoding requires high resource usage or is simply infeasible in case protocols are unknown or encrypted. We propose a framework for application classification using an unsupervised machine learning (ML) technique. Flows are automatically classified based on their statistical characteristics. We also propose a systematic approach to identify an optimal set of flow attributes to use and evaluate the effectiveness of our approach using captured traffic traces.
The research community has begun looking for IP traffic classification techniques that do not rel... more The research community has begun looking for IP traffic classification techniques that do not rely on 'well known' TCP or UDP port numbers, or interpreting the contents of packet payloads. New work is emerging on the use of statistical traffic characteristics to assist in the identification and classification process. This survey paper looks at emerging research into the application of Machine Learning (ML) techniques to IP traffic classification -an inter-disciplinary blend of IP networking and data mining techniques. We provide context and motivation for the application of ML techniques to IP traffic classification, and review 18 significant works that cover the dominant period from 2004 to early 2007. These works are categorized and reviewed according to their choice of ML strategies and primary contributions to the literature. We also discuss a number of key requirements for the employment of ML-based traffic classifiers in operational IP networks, and qualitatively critique the extent to which the reviewed works meet these requirements. Open issues and challenges in the field are also discussed.
Literature on the use of machine learning (ML) algorithms for classifying IP traffic has relied o... more Literature on the use of machine learning (ML) algorithms for classifying IP traffic has relied on fullflows or the first few packets of flows. In contrast, many real-world scenarios require a classification decision well before a flow has finished even if the flow's beginning is lost. This implies classification must be achieved using statistics derived from the most recent N packets taken at any arbitrary point in a flow's lifetime. We propose training the classifier on a combination of short sub-flows (extracted from fullflow examples of the target application's traffic). We demonstrate this optimisation using the Naïve Bayes ML algorithm, and show that our approach results in excellent performance even when classification is initiated mid-way through a flow with windows as small as 25 packets long. We suggest future use of unsupervised ML algorithms to identify optimal subflows for training.
Automated Traffic Classification and Application Identification using Machine Learning
The dynamic classification and identification of network applications responsible for network tra... more The dynamic classification and identification of network applications responsible for network traffic flows offers substantial benefits to a number of key areas in IP network engineering, management and surveillance. Currently such classifications rely on selected packet header fields (eg ...
A number of key areas in IP network engineering, management and surveillance greatly benefit from... more A number of key areas in IP network engineering, management and surveillance greatly benefit from the ability to dynamically identify traffic flows according to the applications responsible for their creation. Currently such classifications rely on selected packet header fields (e.g. destination port) or application layer protocol decoding. These methods have a number of shortfalls e.g. many applications can use unpredictable port numbers and protocol decoding requires high resource usage or is simply infeasible in case protocols are unknown or encrypted. We propose a framework for application classification using an unsupervised machine learning (ML) technique. Flows are automatically classified based on their statistical characteristics. We also propose a systematic approach to identify an optimal set of flow attributes to use and evaluate the effectiveness of our approach using captured traffic traces.
Background. Obesity is a major health problem. Although heritability is substantial, genetic mech... more Background. Obesity is a major health problem. Although heritability is substantial, genetic mechanisms predisposing to obesity are not very well understood. We have performed a genome wide association study (GWA) for early onset (extreme) obesity. Methodology/Principal Findings. a) GWA (Genome-Wide Human SNP Array 5.0 comprising 440,794 single nucleotide polymorphisms) for early onset extreme obesity based on 487 extremely obese young German individuals and 442 healthy lean German controls; b) confirmatory analyses on 644 independent families with at least one obese offspring and both parents. We aimed to identify and subsequently confirm the 15 SNPs (minor allele frequency $10%) with the lowest pvalues of the GWA by four genetic models: additive, recessive, dominant and allelic. Six single nucleotide polymorphisms (SNPs) in FTO (fat mass and obesity associated gene) within one linkage disequilibrium (LD) block including the GWA SNP rendering the lowest p-value (rs1121980; log-additive model: nominal p = 1.13610 27 , corrected p = 0.0494; odds ratio (OR) CT 1.67, 95% confidence interval (CI) 1.22-2.27; OR TT 2.76, 95% CI 1.88-4.03) belonged to the 15 SNPs showing the strongest evidence for association with obesity. For confirmation we genotyped 11 of these in the 644 independent families (of the six FTO SNPs we chose only two representing the LD bock). For both FTO SNPs the initial association was confirmed (both Bonferroni corrected p,0.01). However, none of the nine non-FTO SNPs revealed significant transmission disequilibrium. Conclusions/Significance. Our GWA for extreme early onset obesity substantiates that variation in FTO strongly contributes to early onset obesity. This is a further proof of concept for GWA to detect genes relevant for highly complex phenotypes. We concurrently show that nine additional SNPs with initially low p-values in the GWA were not confirmed in our family study, thus suggesting that of the best 15 SNPs in the GWA only the FTO SNPs represent true positive findings. Citation: Hinney A, Nguyen TT, Scherag A, Friedel S, Brö nner G, et al (2007) Genome Wide Association (GWA) Study for Early Onset Extreme Obesity Supports the Role of Fat Mass and Obesity Associated Gene (FTO) Variants. PLoS ONE 2(12): e1361.
Maori are over represented in all stages of the criminal justice system, including parole. The pr... more Maori are over represented in all stages of the criminal justice system, including parole. The primary purpose of the New Zealand Parole Board is to assess whether an offender poses an undue risk to the safety of the community. There is no reference within the Parole Act to take into consideration the Treaty of Waitangi, 1 nor is there any clear direction on how the decision maker must to take into account the principles of the Treaty of Waitangi. It is clear however that the New Zealand Parole Board must adequately accommodate for Maori cultural concepts, values and practices within its general process, including hearings. In analysing comparative jurisdictions and models this paper recommends an Indigenous Re entry Court as a suitable vehicle to address the disproportionate statistics.
The research community has begun looking for IP traffic classification techniques that do not rel... more The research community has begun looking for IP traffic classification techniques that do not rely on 'well known' TCP or UDP port numbers, or interpreting the contents of packet payloads. New work is emerging on the use of statistical traffic characteristics to assist in the identification and classification process. This survey paper looks at emerging research into the application of Machine Learning (ML) techniques to IP traffic classification -an inter-disciplinary blend of IP networking and data mining techniques. We provide context and motivation for the application of ML techniques to IP traffic classification, and review 18 significant works that cover the dominant period from 2004 to early 2007. These works are categorized and reviewed according to their choice of ML strategies and primary contributions to the literature. We also discuss a number of key requirements for the employment of ML-based traffic classifiers in operational IP networks, and qualitatively critique the extent to which the reviewed works meet these requirements. Open issues and challenges in the field are also discussed.
Literature on the use of machine learning (ML) algorithms for classifying IP traffic has relied o... more Literature on the use of machine learning (ML) algorithms for classifying IP traffic has relied on fullflows or the first few packets of flows. In contrast, many real-world scenarios require a classification decision well before a flow has finished even if the flow's beginning is lost. This implies classification must be achieved using statistics derived from the most recent N packets taken at any arbitrary point in a flow's lifetime. We propose training the classifier on a combination of short sub-flows (extracted from fullflow examples of the target application's traffic). We demonstrate this optimisation using the Naïve Bayes ML algorithm, and show that our approach results in excellent performance even when classification is initiated mid-way through a flow with windows as small as 25 packets long. We suggest future use of unsupervised ML algorithms to identify optimal subflows for training.
Automated Traffic Classification and Application Identification using Machine Learning
The dynamic classification and identification of network applications responsible for network tra... more The dynamic classification and identification of network applications responsible for network traffic flows offers substantial benefits to a number of key areas in IP network engineering, management and surveillance. Currently such classifications rely on selected packet header fields (eg ...
A number of key areas in IP network engineering, management and surveillance greatly benefit from... more A number of key areas in IP network engineering, management and surveillance greatly benefit from the ability to dynamically identify traffic flows according to the applications responsible for their creation. Currently such classifications rely on selected packet header fields (e.g. destination port) or application layer protocol decoding. These methods have a number of shortfalls e.g. many applications can use unpredictable port numbers and protocol decoding requires high resource usage or is simply infeasible in case protocols are unknown or encrypted. We propose a framework for application classification using an unsupervised machine learning (ML) technique. Flows are automatically classified based on their statistical characteristics. We also propose a systematic approach to identify an optimal set of flow attributes to use and evaluate the effectiveness of our approach using captured traffic traces.
The research community has begun looking for IP traffic classification techniques that do not rel... more The research community has begun looking for IP traffic classification techniques that do not rely on 'well known' TCP or UDP port numbers, or interpreting the contents of packet payloads. New work is emerging on the use of statistical traffic characteristics to assist in the identification and classification process. This survey paper looks at emerging research into the application of Machine Learning (ML) techniques to IP traffic classification -an inter-disciplinary blend of IP networking and data mining techniques. We provide context and motivation for the application of ML techniques to IP traffic classification, and review 18 significant works that cover the dominant period from 2004 to early 2007. These works are categorized and reviewed according to their choice of ML strategies and primary contributions to the literature. We also discuss a number of key requirements for the employment of ML-based traffic classifiers in operational IP networks, and qualitatively critique the extent to which the reviewed works meet these requirements. Open issues and challenges in the field are also discussed.
Literature on the use of machine learning (ML) algorithms for classifying IP traffic has relied o... more Literature on the use of machine learning (ML) algorithms for classifying IP traffic has relied on fullflows or the first few packets of flows. In contrast, many real-world scenarios require a classification decision well before a flow has finished even if the flow's beginning is lost. This implies classification must be achieved using statistics derived from the most recent N packets taken at any arbitrary point in a flow's lifetime. We propose training the classifier on a combination of short sub-flows (extracted from fullflow examples of the target application's traffic). We demonstrate this optimisation using the Naïve Bayes ML algorithm, and show that our approach results in excellent performance even when classification is initiated mid-way through a flow with windows as small as 25 packets long. We suggest future use of unsupervised ML algorithms to identify optimal subflows for training.
Automated Traffic Classification and Application Identification using Machine Learning
The dynamic classification and identification of network applications responsible for network tra... more The dynamic classification and identification of network applications responsible for network traffic flows offers substantial benefits to a number of key areas in IP network engineering, management and surveillance. Currently such classifications rely on selected packet header fields (eg ...
A number of key areas in IP network engineering, management and surveillance greatly benefit from... more A number of key areas in IP network engineering, management and surveillance greatly benefit from the ability to dynamically identify traffic flows according to the applications responsible for their creation. Currently such classifications rely on selected packet header fields (e.g. destination port) or application layer protocol decoding. These methods have a number of shortfalls e.g. many applications can use unpredictable port numbers and protocol decoding requires high resource usage or is simply infeasible in case protocols are unknown or encrypted. We propose a framework for application classification using an unsupervised machine learning (ML) technique. Flows are automatically classified based on their statistical characteristics. We also propose a systematic approach to identify an optimal set of flow attributes to use and evaluate the effectiveness of our approach using captured traffic traces.
Background. Obesity is a major health problem. Although heritability is substantial, genetic mech... more Background. Obesity is a major health problem. Although heritability is substantial, genetic mechanisms predisposing to obesity are not very well understood. We have performed a genome wide association study (GWA) for early onset (extreme) obesity. Methodology/Principal Findings. a) GWA (Genome-Wide Human SNP Array 5.0 comprising 440,794 single nucleotide polymorphisms) for early onset extreme obesity based on 487 extremely obese young German individuals and 442 healthy lean German controls; b) confirmatory analyses on 644 independent families with at least one obese offspring and both parents. We aimed to identify and subsequently confirm the 15 SNPs (minor allele frequency $10%) with the lowest pvalues of the GWA by four genetic models: additive, recessive, dominant and allelic. Six single nucleotide polymorphisms (SNPs) in FTO (fat mass and obesity associated gene) within one linkage disequilibrium (LD) block including the GWA SNP rendering the lowest p-value (rs1121980; log-additive model: nominal p = 1.13610 27 , corrected p = 0.0494; odds ratio (OR) CT 1.67, 95% confidence interval (CI) 1.22-2.27; OR TT 2.76, 95% CI 1.88-4.03) belonged to the 15 SNPs showing the strongest evidence for association with obesity. For confirmation we genotyped 11 of these in the 644 independent families (of the six FTO SNPs we chose only two representing the LD bock). For both FTO SNPs the initial association was confirmed (both Bonferroni corrected p,0.01). However, none of the nine non-FTO SNPs revealed significant transmission disequilibrium. Conclusions/Significance. Our GWA for extreme early onset obesity substantiates that variation in FTO strongly contributes to early onset obesity. This is a further proof of concept for GWA to detect genes relevant for highly complex phenotypes. We concurrently show that nine additional SNPs with initially low p-values in the GWA were not confirmed in our family study, thus suggesting that of the best 15 SNPs in the GWA only the FTO SNPs represent true positive findings. Citation: Hinney A, Nguyen TT, Scherag A, Friedel S, Brö nner G, et al (2007) Genome Wide Association (GWA) Study for Early Onset Extreme Obesity Supports the Role of Fat Mass and Obesity Associated Gene (FTO) Variants. PLoS ONE 2(12): e1361.
Maori are over represented in all stages of the criminal justice system, including parole. The pr... more Maori are over represented in all stages of the criminal justice system, including parole. The primary purpose of the New Zealand Parole Board is to assess whether an offender poses an undue risk to the safety of the community. There is no reference within the Parole Act to take into consideration the Treaty of Waitangi, 1 nor is there any clear direction on how the decision maker must to take into account the principles of the Treaty of Waitangi. It is clear however that the New Zealand Parole Board must adequately accommodate for Maori cultural concepts, values and practices within its general process, including hearings. In analysing comparative jurisdictions and models this paper recommends an Indigenous Re entry Court as a suitable vehicle to address the disproportionate statistics.
Uploads
Papers by thuy nguyen