Skip to main content

Ali Bou Nassif

University of Sharjah, Electrical and computer engineering, Faculty Member

Followers

709

Following

82

Co-authors

4

Public Views

Abdullah Moussa

Badji Mokhtar - Annaba University

INSA Lyon

Dana Abdulrahim

University of Bahrain

Birzeit University

University of Monastir

Afnan AlMoammar

InterestsView All (33)

Uploads

Books by Ali Bou Nassif

Measuring SaaS Applications Based on Utilized Features

Papers by Ali Bou Nassif

Systematic Literature Review of Dialectal Arabic: Identification and Detection

IEEE Access

It is becoming increasingly difficult to know who is working on what and how in computational stu... more It is becoming increasingly difficult to know who is working on what and how in computational studies of Dialectal Arabic. This study comes to chart the field by conducting a systematic literature review that is intended to give insight into the most and least popular research areas, dialects, machine learning approaches, neural network input features, data types, datasets, system evaluation criteria, publication venues, and publication trends. It is a review that is guided by the norms of systematic reviews. It has taken account of all the research that adopted a computational approach to dialectal Arabic identification and detection and that was published between 2000 and 2020. It collected, analyzed, and collated this research, discovered its trends, and identified research gaps. It revealed, inter alia, that our research effort has not been directed evenly between speech and text or between the vernaculars; there is some bias favoring text over speech, regional varieties over individual vernaculars, and Egyptian over all other vernaculars. Furthermore, there is a clear preference for shallow machine learning approaches, for the use of n-grams, TF-IDF, and MFCC as neural network features, and for accuracy as a statistical measure of validation of results. This paper also pointed to some glaring gaps in the research: (1) total neglect of Mauritanian and Bahraini in the continuous Arabic language area and of such enclave varieties as Anatolian Arabic, Khuzistan Arabic, Khurasan Arabic, Uzbekistan Arabic, the Subsaharan Arabic of Nigeria and Chad, Djibouti Arabic, Cypriot Arabic and Maltese; (2) scarcity of city dialect resources; (3) rarity of linguistic investigations that would complement our research; (4) and paucity of deep machine learning experimentation.

NEUROFEEDBACK INTERVENTIONS FOR SPEECH AND LANGUAGE IMPAIRMENT: A SYSTEMATIC REVIEW

by ismail shahin, Ali Bou Nassif, and Sara Asal

Journal of Management Information and Decision Sciences, 2021

Language disparities in Arabic-speaking individuals with Speech and Language Impairment (SLI) can... more Language disparities in Arabic-speaking individuals with Speech and Language Impairment (SLI) can increase the limitations faced by SLI specialists. There is a need to improve neurofeedback interventions for patients who belong to this group. In contrast to studies that rely solely on behavioral measures, neurofeedback training can capture cognitive changes in response to learning not seen in time or accuracy measures alone. However, rigorous systematic literature reviews on neurofeedback interventions are limited. This research aims to perform a systematic literature review and analyze neurofeedback studies where Electroencephalography (EEG) or Magnetic Resonance Imaging (MRI) measurements were utilized with the isolation of the effect of a particular behavioral intervention on SLI cognitive profiles. We conducted a systematic literature review of studies on neurofeedback training published between 1992 and 2020. PRISMA guidelines have been used to achieve the review; we choose many keywords to be used in our search. The use of combinations of these keywords in searching of eight digital libraries (Springer Link, Science Direct, PLOS One, Taylor and Francis Online, Wiley Online Library, ISNR online, SAGE Publishing, and Google scholar) generated 155 publications. Out of these, 60 articles were found to be duplicates, and hence, they were removed. Therefore, the total number of final papers that form the primary data for this systematic literature review is 50. We identified 50 studies relevant to the objective of this research. After a rigorous review, we observed that 36 out of 50 (72%) selected studies used EEG or MRI to analyze the effects of neurofeedback on individuals with SLI, and 38% performed comparative analysis and hence improved the quality of their studies. We recommend the standardization of applications of EEG and MRI using BCI methods. EEG and MRI neurofeedback training methods are less expensive and yet more potent compared with pre-existing behavioral intervention and assessment methods with methods that use SLI Neurofeedback.

Death/Recovery Prediction for Covid-19 Patients using Machine Learning

Covid19 is a newly discovered corona virus that has been officially announced as a pandemic by th... more Covid19 is a newly discovered corona virus that has been officially announced as a pandemic by the World Health Organization in March 2020. It is a new virus in the medical field that has no specific treatment and no vaccines until this moment. Covid19 is spreading very fast as the medical systems over the world are not able to hospitalize all the patients which lead into a significant increase in the number of the virus death. This work uses machine learning models to predict which patient has a higher probability of death. Three different algorithms such as multilayer perceptron, support vector machine and K nearest neighbor were used in this work. The accuracies achieved were between 92% to 100% with MLP, SVM and KNN. SVM achieved the highest accuracy. The models were evaluated through precision, accuracy, recall and F measure.

Mining Techniques in Social Media : A Survey

Today, the use of social networks is growing ceaselessly and rapidly. More alarming is the fact t... more Today, the use of social networks is growing ceaselessly and rapidly. More alarming is the fact that these networks have become a substantial pool for unstructured data that belong to a host of domains, including business, governments and health. The increasing reliance on social networks calls for data mining techniques that is likely to facilitate reforming the unstructured data and place them within a systematic pattern. The goal of the present survey is to analyze the data mining techniques that were utilized by social media networks between 2003 and 2015. Espousing criterion-based research strategies, 66 articles were identified to constitute the source of the present paper. After a careful review of these articles, we found that 19 data mining techniques have been used with social media data to address 9 different research objectives in 6 different industrial and services domains. However, the data mining applications in the social media are still raw and require more effort b...

Regression Model for Software Effort Estimation Based on the Use Case Point Method

It is very important to conduct software estimation in the early stages of the software life cycl... more It is very important to conduct software estimation in the early stages of the software life cycle, because it helps managers bid on projects and allocate resources efficiently. This paper presents a novel regression model to estimate the software effort based on the use case point size metric. The use case point model takes use case diagrams as input and gives the software size in use case points as output. The proposed effort equation takes into consideration the non-linear relationship between software size and software effort, as well as the influences of project complexity and productivity. Results show that the software effort estimation accuracy can be improved by 16.5% using PRED(25) and 25% using PRED(35).

Machine Learning for Anomaly Detection: A Systematic Review

IEEE Access

Anomaly detection has been used for decades to identify and extract anomalous components from dat... more Anomaly detection has been used for decades to identify and extract anomalous components from data. Many techniques have been used to detect anomalies. One of the increasingly significant techniques is Machine Learning (ML), which plays an important role in this area. In this research paper, we conduct a Systematic Literature Review (SLR) which analyzes ML models that detect anomalies in their application. Our review analyzes the models from four perspectives; the applications of anomaly detection, ML techniques, performance metrics for ML models, and the classification of anomaly detection. In our review, we have identified 290 research articles, written from 2000-2020, that discuss ML techniques for anomaly detection. After analyzing the selected research articles, we present 43 different applications of anomaly detection found in the selected research articles. Moreover, we identify 29 distinct ML models used in the identification of anomalies. Finally, we present 22 different datasets that are applied in experiments on anomaly detection, as well as many other general datasets. In addition, we observe that unsupervised anomaly detection has been adopted by researchers more than other classification anomaly detection systems. Detection of anomalies using ML models is a promising area of research, and there are a lot of ML models that have been implemented by researchers. Therefore, we provide researchers with recommendations and guidelines based on this review. INDEX TERMS Anomaly detection, machine learning, security and privacy protection.

Software Effort Estimation from Use Case Diagrams Using Nonlinear Regression Analysis

2020 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), 2020

Software effort estimation in the early stages of the software life cycle is one of the most esse... more Software effort estimation in the early stages of the software life cycle is one of the most essential and daunting tasks for project managers. In this research, a new model based on nonlinear regression analysis is proposed to predict software effort from use case diagrams. It is concluded that, where software size is classified from small to very large, one linear or non-linear equation for effort estimation cannot be applied. Our model with three different non-linear regression equations can incorporate the different ranges in software size.

v-SVR Polynomial Kernel for Predicting the Defect Density in New Software Projects

An important product measure to determine the effectiveness of software processes is the defect d... more An important product measure to determine the effectiveness of software processes is the defect density (DD). In this study, we propose the application of support vector regression (SVR) to predict the DD of new software projects obtained from the International Software Benchmarking Standards Group (ISBSG) Release 2018 data set. Two types of SVR (e-SVR and v-SVR) were applied to train and test these projects. Each SVR used four types of kernels. The prediction accuracy of each SVR was compared to that of a statistical regression (i.e., a simple linear regression, SLR). Statistical significance test showed that v-SVR with polynomial kernel was better than that of SLR when new software projects were developed on mainframes and coded in programming languages of third generation

Bayesian Optimization with Machine Learning Algorithms Towards Anomaly Detection

2018 IEEE Global Communications Conference (GLOBECOM)

Network attacks have been very prevalent as their rate is growing tremendously. Both organization... more Network attacks have been very prevalent as their rate is growing tremendously. Both organization and individuals are now concerned about their confidentiality, integrity and availability of their critical information which are often impacted by network attacks. To that end, several previous machine learning-based intrusion detection methods have been developed to secure network infrastructure from such attacks. In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique to tune the parameters of Support Vector Machine with Gaussian Kernel (SVM-RBF), Random Forest (RF), and k-Nearest Neighbor (k-NN) algorithms. The performance of the considered algorithms is evaluated using the ISCX 2012 dataset. Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.

Multi-Stage Optimized Machine Learning Framework for Network Intrusion Detection

IEEE Transactions on Network and Service Management

Empirical analysis on productivity prediction and locality for use case points method

Software Quality Journal

Use Case Points (UCP) method has been around for over two decades. Although, there was a substant... more Use Case Points (UCP) method has been around for over two decades. Although, there was a substantial criticism concerning the algebraic construction and factors assessment of UCP, it remains an efficient early size estimation method. Predicting software effort from UCP is still an ever-present challenge. The earlier version of UCP method suggested using productivity as a cost driver, where fixed or a few pre-defined productivity ratios have been widely agreed. While this approach was successful when no enough historical data is available, it is no longer acceptable because software projects are different in terms of development aspects. Therefore, it is better to understand the relationship between productivity and other UCP variables. This paper examines the impact of data locality approaches on productivity and effort prediction from multiple UCP variables. The environmental factors are used as partitioning factors to produce local homogeneous data either based on their influential levels or using clustering algorithms. Different machine learning methods, including solo and ensemble methods, are used to construct productivity and effort prediction models based on the local data. The results demonstrate that the prediction models that are created based on local data surpass models that use entire data. Also, the results show that conforming the hypothetical assumption between productivity and environmental factors is not necessarily a requirement for success of locality.

Machine Learning for Cloud Security: A Systematic Review

IEEE Access

The popularity and usage of Cloud computing is increasing rapidly. Several companies are investin... more The popularity and usage of Cloud computing is increasing rapidly. Several companies are investing in this field either for their own use or to provide it as a service for others. One of the results of Cloud development is the emergence of various security problems for both industry and consumer. One of the ways to secure Cloud is by using Machine Learning (ML). ML techniques have been used in various ways to prevent or detect attacks and security gaps on the Cloud. In this paper, we provide a Systematic Literature Review (SLR) of ML and Cloud security methodologies and techniques. We analyzed 63 relevant studies and the results of the SLR are categorized into three main research areas: (i) the different types of Cloud security threats, (ii) ML techniques used, and (iii) the performance outcomes. We have defined 11 Cloud security areas. Moreover, distributed denial-of-service (DDoS) and data privacy are the most common Cloud security areas, with a 16% level of use and 14%respectively. On the other hand, we found 30 ML techniques used, some used hybrid and others as standalone. The most popular ML used is SVM in both hybrid and standalone models. Furthermore, 60% of the papers compared their models with other models to prove the efficiency of their proposed model. Moreover, 13 different evaluation metrics were enumerated. The most applied metric is true positive rate and least used is training time. Lastly, from 20 datasets found, KDD and KDD CUP'99 are the most used among relevant studies.

Perceptions about Software Testing among UAE Software Students

2021 IEEE/ACM 13th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE)

Software testing is one of the crucial supporting processes of software life cycle. Unfortunately... more Software testing is one of the crucial supporting processes of software life cycle. Unfortunately for the software industry, the role is stigmatized, partly due to misperception and partly due to the treatment of the testing role within the software industry. The present study aims to analyse this situation to explore what might inhibit an individual from taking up a software testing career. In order to investigate this issue, we surveyed 132 senior students pursuing a degree in computer science and information and communication technology (ICT) related areas at three universities in the United Arab Emirates: UAE University in Al Ain, Sharjah University in Sharjah and New York University in Abu Dhabi. The students were asked to describe the PROs and CONs of taking up a career in software testing, and to describe the likelihood that they would take up the career themselves. The study identified 7 main PROs and 9 main CONs for pursuing a testing career, and indicated that the role of software tester is perceived as a social role, which may require as many soft skills as technical knowledge. The results also show that the UAE students have a stronger negative attitude towards software testing than do their counterparts where similar investigations have been carried out in different countries in the past three years.

Heterogeneous Ensembles for Software Development Effort Estimation

2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI)

Software effort estimation influences almost all the process of software development such as: bid... more Software effort estimation influences almost all the process of software development such as: bidding, planning, and budgeting. Hence, delivering an accurate estimation in early stages of the software life cycle may be the key of success of any project. To this aim, many solo techniques have been proposed to predict the effort required to develop a software system. Nevertheless, none of them proved to be suitable in all circumstances. Recently, Ensemble Effort Estimation has been investigated to estimate software effort and consists on generating the software effort by combining more than one solo estimation technique by means of a combination rule. In this study, a heterogeneous EEE based on four machine learning techniques was investigated using three linear rules and two well-known datasets. The results of this study suggest that the proposed heterogeneous EEE yields a very promising performance and there is no best combiner rule that can be recommended.

CASA-based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions

Applied Soft Computing

Multi-split optimized bagging ensemble model selection for multi-class educational data mining

Applied Intelligence

Predicting students' academic performance has been a research area of interest in recent years wi... more Predicting students' academic performance has been a research area of interest in recent years with many institutions focusing on improving the students' performance and the education quality. The analysis and prediction of students' performance can be achieved using various data mining techniques. Moreover, such techniques allow instructors to determine possible factors that may affect the students' final marks. To that end, this work analyzes two different undergraduate datasets at two different universities. Furthermore, this work aims to predict the students' performance at two stages of course delivery (20% and 50% respectively). This analysis allows for properly choosing the appropriate machine learning algorithms to use as well as optimize the algorithms' parameters. Furthermore, this work adopts a systematic multi-split approach based on Gini index and p-value. This is done by optimizing a suitable bagging ensemble learner that is built from any combination of six potential base machine learning algorithms. It is shown through experimental results that the posited bagging ensemble models achieve high accuracy for the target group for both datasets.

Speaker Verification in Emotional Talking Environments based on Third-Order Circular Suprasegmental Hidden Markov Model

2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA), Nov 1, 2019

Speaker verification accuracy in emotional talking environments is not high as it is in neutral o... more Speaker verification accuracy in emotional talking environments is not high as it is in neutral ones. This work aims at accepting or rejecting the claimed speaker using his/her voice in emotional environments based on the "Third-Order Circular Suprasegmental Hidden Markov Model (CSPHMM3)" as a classifier. An Emirati-accented (Arabic) speech database with "Mel-Frequency Cepstral Coefficients" as the extracted features has been used to evaluate our work. Our results demonstrate that speaker verification accuracy based on CSPHMM3 is greater than that based on the "state-of-the-art classifiers and models such as Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector Quantization (VQ)".

Machine Learning Classifications of Coronary Artery Disease

2018 International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), Nov 1, 2018

Coronary Artery Disease (CAD) is one of the leading causes of death worldwide, and so it is very ... more Coronary Artery Disease (CAD) is one of the leading causes of death worldwide, and so it is very important to correctly diagnose patients with the disease. For medical diagnosis, machine learning is a useful tool; however features and algorithms must be carefully selected to get accurate classification. To this effect, three feature selection methods have been used on 13 input features from the Cleveland dataset with 297 entries, and 7 were selected. The selected features were used to train three different classifiers, which are SVM, Naïve Bayes and KNN using 10-fold cross-validation. The resulting models evaluated using Accuracy, Recall, Specificity and Precision. It is found that the Naïve Bayes classifier performs the best on this dataset and features, outperforming or matching SVM and KNN in all the four evaluation parameters used and achieving an accuracy of 84%.

Ensemble of Learning Project Productivity in Software Effort Based on Use Case Points

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Dec 1, 2018

It is well recognized that the project productivity is a key driver in estimating software projec... more It is well recognized that the project productivity is a key driver in estimating software project effort from Use Case Point size metric at early software development stages. Although, there are few proposed models for predicting productivity, there is no consistent conclusion regarding which model is the superior. Therefore, instead of building a new productivity prediction model, this paper presents a new ensemble construction mechanism applied for software project productivity prediction. Ensemble is an effective technique when performance of base models is poor. We proposed a weighted mean method to aggregate predicted productivities based on average of errors produced by training model. The obtained results show that the using ensemble is a good alternative approach when accuracies of base models are not consistently accurate over different datasets, and when models behave diversely.

Measuring SaaS Applications Based on Utilized Features

Systematic Literature Review of Dialectal Arabic: Identification and Detection

IEEE Access

It is becoming increasingly difficult to know who is working on what and how in computational stu... more It is becoming increasingly difficult to know who is working on what and how in computational studies of Dialectal Arabic. This study comes to chart the field by conducting a systematic literature review that is intended to give insight into the most and least popular research areas, dialects, machine learning approaches, neural network input features, data types, datasets, system evaluation criteria, publication venues, and publication trends. It is a review that is guided by the norms of systematic reviews. It has taken account of all the research that adopted a computational approach to dialectal Arabic identification and detection and that was published between 2000 and 2020. It collected, analyzed, and collated this research, discovered its trends, and identified research gaps. It revealed, inter alia, that our research effort has not been directed evenly between speech and text or between the vernaculars; there is some bias favoring text over speech, regional varieties over individual vernaculars, and Egyptian over all other vernaculars. Furthermore, there is a clear preference for shallow machine learning approaches, for the use of n-grams, TF-IDF, and MFCC as neural network features, and for accuracy as a statistical measure of validation of results. This paper also pointed to some glaring gaps in the research: (1) total neglect of Mauritanian and Bahraini in the continuous Arabic language area and of such enclave varieties as Anatolian Arabic, Khuzistan Arabic, Khurasan Arabic, Uzbekistan Arabic, the Subsaharan Arabic of Nigeria and Chad, Djibouti Arabic, Cypriot Arabic and Maltese; (2) scarcity of city dialect resources; (3) rarity of linguistic investigations that would complement our research; (4) and paucity of deep machine learning experimentation.

NEUROFEEDBACK INTERVENTIONS FOR SPEECH AND LANGUAGE IMPAIRMENT: A SYSTEMATIC REVIEW

by ismail shahin, Ali Bou Nassif, and Sara Asal

Journal of Management Information and Decision Sciences, 2021

Language disparities in Arabic-speaking individuals with Speech and Language Impairment (SLI) can... more Language disparities in Arabic-speaking individuals with Speech and Language Impairment (SLI) can increase the limitations faced by SLI specialists. There is a need to improve neurofeedback interventions for patients who belong to this group. In contrast to studies that rely solely on behavioral measures, neurofeedback training can capture cognitive changes in response to learning not seen in time or accuracy measures alone. However, rigorous systematic literature reviews on neurofeedback interventions are limited. This research aims to perform a systematic literature review and analyze neurofeedback studies where Electroencephalography (EEG) or Magnetic Resonance Imaging (MRI) measurements were utilized with the isolation of the effect of a particular behavioral intervention on SLI cognitive profiles. We conducted a systematic literature review of studies on neurofeedback training published between 1992 and 2020. PRISMA guidelines have been used to achieve the review; we choose many keywords to be used in our search. The use of combinations of these keywords in searching of eight digital libraries (Springer Link, Science Direct, PLOS One, Taylor and Francis Online, Wiley Online Library, ISNR online, SAGE Publishing, and Google scholar) generated 155 publications. Out of these, 60 articles were found to be duplicates, and hence, they were removed. Therefore, the total number of final papers that form the primary data for this systematic literature review is 50. We identified 50 studies relevant to the objective of this research. After a rigorous review, we observed that 36 out of 50 (72%) selected studies used EEG or MRI to analyze the effects of neurofeedback on individuals with SLI, and 38% performed comparative analysis and hence improved the quality of their studies. We recommend the standardization of applications of EEG and MRI using BCI methods. EEG and MRI neurofeedback training methods are less expensive and yet more potent compared with pre-existing behavioral intervention and assessment methods with methods that use SLI Neurofeedback.

Death/Recovery Prediction for Covid-19 Patients using Machine Learning

Covid19 is a newly discovered corona virus that has been officially announced as a pandemic by th... more Covid19 is a newly discovered corona virus that has been officially announced as a pandemic by the World Health Organization in March 2020. It is a new virus in the medical field that has no specific treatment and no vaccines until this moment. Covid19 is spreading very fast as the medical systems over the world are not able to hospitalize all the patients which lead into a significant increase in the number of the virus death. This work uses machine learning models to predict which patient has a higher probability of death. Three different algorithms such as multilayer perceptron, support vector machine and K nearest neighbor were used in this work. The accuracies achieved were between 92% to 100% with MLP, SVM and KNN. SVM achieved the highest accuracy. The models were evaluated through precision, accuracy, recall and F measure.

Mining Techniques in Social Media : A Survey

Today, the use of social networks is growing ceaselessly and rapidly. More alarming is the fact t... more Today, the use of social networks is growing ceaselessly and rapidly. More alarming is the fact that these networks have become a substantial pool for unstructured data that belong to a host of domains, including business, governments and health. The increasing reliance on social networks calls for data mining techniques that is likely to facilitate reforming the unstructured data and place them within a systematic pattern. The goal of the present survey is to analyze the data mining techniques that were utilized by social media networks between 2003 and 2015. Espousing criterion-based research strategies, 66 articles were identified to constitute the source of the present paper. After a careful review of these articles, we found that 19 data mining techniques have been used with social media data to address 9 different research objectives in 6 different industrial and services domains. However, the data mining applications in the social media are still raw and require more effort b...

Regression Model for Software Effort Estimation Based on the Use Case Point Method

It is very important to conduct software estimation in the early stages of the software life cycl... more It is very important to conduct software estimation in the early stages of the software life cycle, because it helps managers bid on projects and allocate resources efficiently. This paper presents a novel regression model to estimate the software effort based on the use case point size metric. The use case point model takes use case diagrams as input and gives the software size in use case points as output. The proposed effort equation takes into consideration the non-linear relationship between software size and software effort, as well as the influences of project complexity and productivity. Results show that the software effort estimation accuracy can be improved by 16.5% using PRED(25) and 25% using PRED(35).

Machine Learning for Anomaly Detection: A Systematic Review

IEEE Access

Anomaly detection has been used for decades to identify and extract anomalous components from dat... more Anomaly detection has been used for decades to identify and extract anomalous components from data. Many techniques have been used to detect anomalies. One of the increasingly significant techniques is Machine Learning (ML), which plays an important role in this area. In this research paper, we conduct a Systematic Literature Review (SLR) which analyzes ML models that detect anomalies in their application. Our review analyzes the models from four perspectives; the applications of anomaly detection, ML techniques, performance metrics for ML models, and the classification of anomaly detection. In our review, we have identified 290 research articles, written from 2000-2020, that discuss ML techniques for anomaly detection. After analyzing the selected research articles, we present 43 different applications of anomaly detection found in the selected research articles. Moreover, we identify 29 distinct ML models used in the identification of anomalies. Finally, we present 22 different datasets that are applied in experiments on anomaly detection, as well as many other general datasets. In addition, we observe that unsupervised anomaly detection has been adopted by researchers more than other classification anomaly detection systems. Detection of anomalies using ML models is a promising area of research, and there are a lot of ML models that have been implemented by researchers. Therefore, we provide researchers with recommendations and guidelines based on this review. INDEX TERMS Anomaly detection, machine learning, security and privacy protection.

Software Effort Estimation from Use Case Diagrams Using Nonlinear Regression Analysis

2020 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), 2020

Software effort estimation in the early stages of the software life cycle is one of the most esse... more Software effort estimation in the early stages of the software life cycle is one of the most essential and daunting tasks for project managers. In this research, a new model based on nonlinear regression analysis is proposed to predict software effort from use case diagrams. It is concluded that, where software size is classified from small to very large, one linear or non-linear equation for effort estimation cannot be applied. Our model with three different non-linear regression equations can incorporate the different ranges in software size.

v-SVR Polynomial Kernel for Predicting the Defect Density in New Software Projects

An important product measure to determine the effectiveness of software processes is the defect d... more An important product measure to determine the effectiveness of software processes is the defect density (DD). In this study, we propose the application of support vector regression (SVR) to predict the DD of new software projects obtained from the International Software Benchmarking Standards Group (ISBSG) Release 2018 data set. Two types of SVR (e-SVR and v-SVR) were applied to train and test these projects. Each SVR used four types of kernels. The prediction accuracy of each SVR was compared to that of a statistical regression (i.e., a simple linear regression, SLR). Statistical significance test showed that v-SVR with polynomial kernel was better than that of SLR when new software projects were developed on mainframes and coded in programming languages of third generation

Bayesian Optimization with Machine Learning Algorithms Towards Anomaly Detection

2018 IEEE Global Communications Conference (GLOBECOM)

Network attacks have been very prevalent as their rate is growing tremendously. Both organization... more Network attacks have been very prevalent as their rate is growing tremendously. Both organization and individuals are now concerned about their confidentiality, integrity and availability of their critical information which are often impacted by network attacks. To that end, several previous machine learning-based intrusion detection methods have been developed to secure network infrastructure from such attacks. In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique to tune the parameters of Support Vector Machine with Gaussian Kernel (SVM-RBF), Random Forest (RF), and k-Nearest Neighbor (k-NN) algorithms. The performance of the considered algorithms is evaluated using the ISCX 2012 dataset. Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.

Multi-Stage Optimized Machine Learning Framework for Network Intrusion Detection

IEEE Transactions on Network and Service Management

Empirical analysis on productivity prediction and locality for use case points method

Software Quality Journal

Use Case Points (UCP) method has been around for over two decades. Although, there was a substant... more Use Case Points (UCP) method has been around for over two decades. Although, there was a substantial criticism concerning the algebraic construction and factors assessment of UCP, it remains an efficient early size estimation method. Predicting software effort from UCP is still an ever-present challenge. The earlier version of UCP method suggested using productivity as a cost driver, where fixed or a few pre-defined productivity ratios have been widely agreed. While this approach was successful when no enough historical data is available, it is no longer acceptable because software projects are different in terms of development aspects. Therefore, it is better to understand the relationship between productivity and other UCP variables. This paper examines the impact of data locality approaches on productivity and effort prediction from multiple UCP variables. The environmental factors are used as partitioning factors to produce local homogeneous data either based on their influential levels or using clustering algorithms. Different machine learning methods, including solo and ensemble methods, are used to construct productivity and effort prediction models based on the local data. The results demonstrate that the prediction models that are created based on local data surpass models that use entire data. Also, the results show that conforming the hypothetical assumption between productivity and environmental factors is not necessarily a requirement for success of locality.

Machine Learning for Cloud Security: A Systematic Review

IEEE Access

The popularity and usage of Cloud computing is increasing rapidly. Several companies are investin... more The popularity and usage of Cloud computing is increasing rapidly. Several companies are investing in this field either for their own use or to provide it as a service for others. One of the results of Cloud development is the emergence of various security problems for both industry and consumer. One of the ways to secure Cloud is by using Machine Learning (ML). ML techniques have been used in various ways to prevent or detect attacks and security gaps on the Cloud. In this paper, we provide a Systematic Literature Review (SLR) of ML and Cloud security methodologies and techniques. We analyzed 63 relevant studies and the results of the SLR are categorized into three main research areas: (i) the different types of Cloud security threats, (ii) ML techniques used, and (iii) the performance outcomes. We have defined 11 Cloud security areas. Moreover, distributed denial-of-service (DDoS) and data privacy are the most common Cloud security areas, with a 16% level of use and 14%respectively. On the other hand, we found 30 ML techniques used, some used hybrid and others as standalone. The most popular ML used is SVM in both hybrid and standalone models. Furthermore, 60% of the papers compared their models with other models to prove the efficiency of their proposed model. Moreover, 13 different evaluation metrics were enumerated. The most applied metric is true positive rate and least used is training time. Lastly, from 20 datasets found, KDD and KDD CUP'99 are the most used among relevant studies.

Perceptions about Software Testing among UAE Software Students

2021 IEEE/ACM 13th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE)

Software testing is one of the crucial supporting processes of software life cycle. Unfortunately... more Software testing is one of the crucial supporting processes of software life cycle. Unfortunately for the software industry, the role is stigmatized, partly due to misperception and partly due to the treatment of the testing role within the software industry. The present study aims to analyse this situation to explore what might inhibit an individual from taking up a software testing career. In order to investigate this issue, we surveyed 132 senior students pursuing a degree in computer science and information and communication technology (ICT) related areas at three universities in the United Arab Emirates: UAE University in Al Ain, Sharjah University in Sharjah and New York University in Abu Dhabi. The students were asked to describe the PROs and CONs of taking up a career in software testing, and to describe the likelihood that they would take up the career themselves. The study identified 7 main PROs and 9 main CONs for pursuing a testing career, and indicated that the role of software tester is perceived as a social role, which may require as many soft skills as technical knowledge. The results also show that the UAE students have a stronger negative attitude towards software testing than do their counterparts where similar investigations have been carried out in different countries in the past three years.

Heterogeneous Ensembles for Software Development Effort Estimation

2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI)

Software effort estimation influences almost all the process of software development such as: bid... more Software effort estimation influences almost all the process of software development such as: bidding, planning, and budgeting. Hence, delivering an accurate estimation in early stages of the software life cycle may be the key of success of any project. To this aim, many solo techniques have been proposed to predict the effort required to develop a software system. Nevertheless, none of them proved to be suitable in all circumstances. Recently, Ensemble Effort Estimation has been investigated to estimate software effort and consists on generating the software effort by combining more than one solo estimation technique by means of a combination rule. In this study, a heterogeneous EEE based on four machine learning techniques was investigated using three linear rules and two well-known datasets. The results of this study suggest that the proposed heterogeneous EEE yields a very promising performance and there is no best combiner rule that can be recommended.

CASA-based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions

Applied Soft Computing

Multi-split optimized bagging ensemble model selection for multi-class educational data mining

Applied Intelligence

Predicting students' academic performance has been a research area of interest in recent years wi... more Predicting students' academic performance has been a research area of interest in recent years with many institutions focusing on improving the students' performance and the education quality. The analysis and prediction of students' performance can be achieved using various data mining techniques. Moreover, such techniques allow instructors to determine possible factors that may affect the students' final marks. To that end, this work analyzes two different undergraduate datasets at two different universities. Furthermore, this work aims to predict the students' performance at two stages of course delivery (20% and 50% respectively). This analysis allows for properly choosing the appropriate machine learning algorithms to use as well as optimize the algorithms' parameters. Furthermore, this work adopts a systematic multi-split approach based on Gini index and p-value. This is done by optimizing a suitable bagging ensemble learner that is built from any combination of six potential base machine learning algorithms. It is shown through experimental results that the posited bagging ensemble models achieve high accuracy for the target group for both datasets.

Speaker Verification in Emotional Talking Environments based on Third-Order Circular Suprasegmental Hidden Markov Model

2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA), Nov 1, 2019

Speaker verification accuracy in emotional talking environments is not high as it is in neutral o... more Speaker verification accuracy in emotional talking environments is not high as it is in neutral ones. This work aims at accepting or rejecting the claimed speaker using his/her voice in emotional environments based on the "Third-Order Circular Suprasegmental Hidden Markov Model (CSPHMM3)" as a classifier. An Emirati-accented (Arabic) speech database with "Mel-Frequency Cepstral Coefficients" as the extracted features has been used to evaluate our work. Our results demonstrate that speaker verification accuracy based on CSPHMM3 is greater than that based on the "state-of-the-art classifiers and models such as Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector Quantization (VQ)".

Machine Learning Classifications of Coronary Artery Disease

2018 International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), Nov 1, 2018

Coronary Artery Disease (CAD) is one of the leading causes of death worldwide, and so it is very ... more Coronary Artery Disease (CAD) is one of the leading causes of death worldwide, and so it is very important to correctly diagnose patients with the disease. For medical diagnosis, machine learning is a useful tool; however features and algorithms must be carefully selected to get accurate classification. To this effect, three feature selection methods have been used on 13 input features from the Cleveland dataset with 297 entries, and 7 were selected. The selected features were used to train three different classifiers, which are SVM, Naïve Bayes and KNN using 10-fold cross-validation. The resulting models evaluated using Accuracy, Recall, Specificity and Precision. It is found that the Naïve Bayes classifier performs the best on this dataset and features, outperforming or matching SVM and KNN in all the four evaluation parameters used and achieving an accuracy of 84%.

Ensemble of Learning Project Productivity in Software Effort Based on Use Case Points

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Dec 1, 2018

It is well recognized that the project productivity is a key driver in estimating software projec... more It is well recognized that the project productivity is a key driver in estimating software project effort from Use Case Point size metric at early software development stages. Although, there are few proposed models for predicting productivity, there is no consistent conclusion regarding which model is the superior. Therefore, instead of building a new productivity prediction model, this paper presents a new ensemble construction mechanism applied for software project productivity prediction. Ensemble is an effective technique when performance of base models is poor. We proposed a weighted mean method to aggregate predicted productivities based on average of errors produced by training model. The obtained results show that the using ensemble is a good alternative approach when accuracies of base models are not consistently accurate over different datasets, and when models behave diversely.

Can we rely on smartphone applications

Smartphones are becoming necessary tools in the daily lives of millions of users who rely on thes... more Smartphones are becoming necessary tools in the daily lives of millions of users who rely on these devices and their applications. There are thousands of applications for smartphone devices such as the iPhone, Blackberry, and Android, thus their reliability has become paramount for their users. This work aims to answer two related questions: (1) Can we assess the reliability of mobile applications by using the traditional reliability models? (2) Can we model adequately the failure data collected from many users? Firstly, it has been proved that the three most used software reliability models have fallen short of the mark when applied to smartphone applications; their failures were traced back to specific features of mobile applications. Secondly, it has been demonstrated that the Weibull and Gamma distribution models can adequately fit the observed failure data, thus providing better means to predict the reliability of smartphone applications.