Blood cancer has been a growing concern during the last decade and requires early diagnosis to st... more Blood cancer has been a growing concern during the last decade and requires early diagnosis to start proper treatment. The diagnosis process is costly and time-consuming involving medical experts and several tests. Thus, an automatic diagnosis system for its accurate prediction is of significant importance. Diagnosis of blood cancer using leukemia microarray gene data and machine learning approach has become an important medical research today. Despite research efforts, desired accuracy and efficiency necessitate further enhancements. This study proposes an approach for blood cancer disease prediction using the supervised machine learning approach. For the current study, the leukemia microarray gene dataset containing 22,283 genes, is used. ADASYN resampling and Chi-squared (Chi2) features selection techniques are used to resolve imbalanced and high-dimensional dataset problems. ADASYN generates artificial data to make the dataset balanced for each target class, and Chi2 selects the...
Rapid urbanization to meet the needs of the growing population has led to several challenges such... more Rapid urbanization to meet the needs of the growing population has led to several challenges such as pollution, increased and congested traffic, poor sustainability, and impact on the ecological environment. The conception of smart cities comprising intelligent convergence systems has been regarded as a potential solution to overcome these problems. Based on the information, communications, and technology (ICT), the idea of a smart city has emerged to decrease the impact of rapid urbanization. In this context, important efforts have been made for making cities smarter and more sustainable. However, the challenges associated with the implementation and evaluation of smart cities in developing countries are not examined appropriately, particularly in the Moroccan context. To analyze the efficacy and success of such efforts, the evaluation and comparisons using common frameworks are significantly important. For this purpose, the present research aims to investigate and evaluate the mos...
The spread of altered media in the form of fake videos, audios, and images, has been largely incr... more The spread of altered media in the form of fake videos, audios, and images, has been largely increased over the past few years. Advanced digital manipulation tools and techniques make it easier to generate fake content and post it on social media. In addition, tweets with deep fake content make their way to social platforms. The polarity of such tweets is significant to determine the sentiment of people about deep fakes. This paper presents a deep learning model to predict the polarity of deep fake tweets. For this purpose, a stacked bi-directional long short-term memory (SBi-LSTM) network is proposed to classify the sentiment of deep fake tweets. Several well-known machine learning classifiers are investigated as well such as support vector machine, logistic regression, Gaussian Naive Bayes, extra tree classifier, and AdaBoost classifier. These classifiers are utilized with term frequency-inverse document frequency and a bag of words feature extraction approaches. Besides, the perf...
Social media platforms and microblogging websites have gained accelerated popularity during the p... more Social media platforms and microblogging websites have gained accelerated popularity during the past few years. These platforms are used for expressing views and opinions about products, personalities, and events. Often during discussions and debates, fights take place on social media platforms which involves using rude, disrespectful, and hateful comments called toxic comments. The identification of toxic comments has been regarded as an essential element for social media platforms. This study introduces an ensemble approach, called regression vector voting classifier (RVVC), to identify the toxic comments on social media platforms. The ensemble merges the logistic regression and support vector classifier under soft voting criteria. Several experiments are performed on the imbalanced and balanced dataset to analyze the performance of the proposed approach. For data balance, the synthetic minority oversampling technique (SMOTE) is used on the imbalanced dataset. Furthermore, two feature extraction approaches are utilized to investigate their suitability such as term frequency-inverse document frequency (TF-IDF) and bag-of-words (BoW). The performance of the proposed approach is compared with several machine learning classifiers using accuracy, precision, recall, and F1-score. Results suggest that RVVC outperforms all other individual models when TF-IDF features are used with SMOTE balanced dataset and achieves an accuracy of 0.97.
Regular inspection of railway track health is crucial for maintaining safe and reliable train ope... more Regular inspection of railway track health is crucial for maintaining safe and reliable train operations. Factors, such as cracks, ballast issues, rail discontinuity, loose nuts and bolts, burnt wheels, superelevation, and misalignment developed on the rails due to non-maintenance, pre-emptive investigations and delayed detection, pose a grave danger and threats to the safe operation of rail transport. The traditional procedure of manually inspecting the rail track using a railway cart is both inefficient and prone to human error and biases. In a country like Pakistan where train accidents have taken many lives, it is not unusual to automate such approaches to avoid such accidents and save countless lives. This study aims at enhancing the traditional railway cart system to address these issues by introducing an automatic railway track fault detection system using acoustic analysis. In this regard, this study makes two important contributions: data collection on Pakistan railway trac...
Wireless capsule endoscopy (WCE) is an efficient tool to investigate gastrointestinal tract disor... more Wireless capsule endoscopy (WCE) is an efficient tool to investigate gastrointestinal tract disorders and perform painless imaging of the intestine. Despite that, several concerns make its wide applicability and adaptation challenging like efficacy, tolerance, safety, and performance. Besides, automatic analysis of the WCE provided dataset is of great importance for detecting abnormalities. Imaging of the patient's digestive tract through WCE produces a large dataset that requires a substantial amount of time and a special skill set from a medical practitioner for analysis. Several computer-aided and vision-based solutions have been proposed to resolve these issues, yet, they do not provide the desired level of accuracy and further improvements are still needed. The current study aims to devise a system that can perform the task of automatic analysis of WCE images to identify abnormalities and assist practitioners for robust diagnosis. This study adopts a deep neural network approach and proposes a model name BIR (bleedy image recognizer) that combines the MobileNet with a custom-built convolutional neural network (CNN) model to classify WCE bleedy images. BIR uses the MobileNet model for initial-level computation for its lower computation power requirement and subsequently the output is fed to the CNN for further processing. A dataset of 1650 WCE images is used to train and test the model using the measures of accuracy, precision, recall, F1 score, and Cohen's kappa to evaluate the performance of the BIR. Results indicate the promising outcomes with achieved accuracy, precision, recall, F1 score, and Cohen's kappa of 0.993, 1.000, 0.994, 0.997, and 0.995 respectively. The accuracy of the BIR model is 0.978 with the Google collected WCE image dataset which is better than the state-of-art approaches. INDEX TERMS Wireless capsule endoscopy, deep learning, computer vision, gastrointestinal tract infection, classification, convolutional neural networks.
Sarcasm emerges as a common phenomenon across social networking sites because people express thei... more Sarcasm emerges as a common phenomenon across social networking sites because people express their negative thoughts, hatred and opinions using positive vocabulary which makes it a challenging task to detect sarcasm. Although various studies have investigated the sarcasm detection on baseline datasets, this work is the first to detect sarcasm from a multi-domain dataset that is constructed by combining Twitter and News Headlines datasets. This study proposes a hybrid approach where the convolutional neural networks (CNN) are used for feature extraction while the long short-term memory (LSTM) is trained and tested on those features. For performance analysis, several machine learning algorithms such as random forest, support vector classifier, extra tree classifier and decision tree are used. The performance of both the proposed model and machine learning algorithms is analyzed using the term frequency-inverse document frequency, bag of words approach, and global vectors for word repr...
The spread of Covid-19 has resulted in worldwide health concerns. Social media is increasingly us... more The spread of Covid-19 has resulted in worldwide health concerns. Social media is increasingly used to share news and opinions about it. A realistic assessment of the situation is necessary to utilize resources optimally and appropriately. In this research, we perform Covid-19 tweets sentiment analysis using a supervised machine learning approach. Identification of Covid-19 sentiments from tweets would allow informed decisions for better handling the current pandemic situation. The used dataset is extracted from Twitter using IDs as provided by the IEEE data port. Tweets are extracted by an in-house built crawler that uses the Tweepy library. The dataset is cleaned using the preprocessing techniques and sentiments are extracted using the TextBlob library. The contribution of this work is the performance evaluation of various machine learning classifiers using our proposed feature set. This set is formed by concatenating the bag-of-words and the term frequency-inverse document freque...
Machine learning (ML) based forecasting mechanisms have proved their significance to anticipate i... more Machine learning (ML) based forecasting mechanisms have proved their significance to anticipate in perioperative outcomes to improve the decision making on the future course of actions. The ML models have long been used in many application domains which needed the identification and prioritization of adverse factors for a threat. Several prediction methods are being popularly used to handle forecasting problems. This study demonstrates the capability of ML models to forecast the number of upcoming patients affected by COVID-19 which is presently considered as a potential threat to mankind. In particular, four standard forecasting models, such as linear regression (LR), least absolute shrinkage and selection operator (LASSO), support vector machine (SVM), and exponential smoothing (ES) have been used in this study to forecast the threatening factors of COVID-19. Three types of predictions are made by each of the models, such as the number of newly infected cases, the number of deaths, and the number of recoveries in the next 10 days. The results produced by the study proves it a promising mechanism to use these methods for the current scenario of the COVID-19 pandemic. The results prove that the ES performs best among all the used models followed by LR and LASSO which performs well in forecasting the new confirmed cases, death rate as well as recovery rate, while SVM performs poorly in all the prediction scenarios given the available dataset. INDEX TERMS COVID-19, exponential smoothing method, future forecasting, adjusted R 2 score, supervised machine learning.
Amid the worldwide COVID-19 pandemic lockdowns, the closure of educational institutes leads to an... more Amid the worldwide COVID-19 pandemic lockdowns, the closure of educational institutes leads to an unprecedented rise in online learning. For limiting the impact of COVID-19 and obstructing its widespread, educational institutions closed their campuses immediately and academic activities are moved to e-learning platforms. The effectiveness of e-learning is a critical concern for both students and parents, specifically in terms of its suitability to students and teachers and its technical feasibility with respect to different social scenarios. Such concerns must be reviewed from several aspects before e-learning can be adopted at such a larger scale. This study endeavors to investigate the effectiveness of e-learning by analyzing the sentiments of people about e-learning. Due to the rise of social media as an important mode of communication recently, people’s views can be found on platforms such as Twitter, Instagram, Facebook, etc. This study uses a Twitter dataset containing 17,155 ...
The recent development of machines exhibiting intelligent characteristics involves numerous techn... more The recent development of machines exhibiting intelligent characteristics involves numerous techniques including computer hardware and software architecture development. Many different hardware devices, wearable sensors, machine learning, and deep learning model implementations are being applied in human activity recognition (HAR) applications in recent times. However, to develop high accuracy classification systems for activity recognition using ilow-cost hardware technology is of significant importance. To achieve this goal this study uses sensor data from two low-cost sensors, gyroscope and accelerometer along with the implementation of an Artificial Neural Network (ANN) based deep learning model for HAR. In particular, Deep Stacked Multilayered Perceptron (DS-MLP) has been proposed. In the implementation of DS-MLP, an ANN model has been used as a meta-learner while five MLP models have been used as base-learners. In this study, these base-learners and meta-learner have been combined using a stack ensemble technique. The performance evaluations have been done first on the applicability of individual base-models followed by the application of DS-MLP, the results prove the high accuracy of 97.3% and 99.4% for heterogeneous datasets used for testing. The performance of the proposed DS-MLP models has been compared to some existing machine learning classifiers and several state-of-the-art activity recognition systems. The comparative result analysis also proves that the proposed system performed better than these classification approaches in terms of important performance metrics such as accuracy, precision, recall, Fscore, Cohen's Kappa, and Mathew correlation coefficient.
In recent years, the classification of class-imbalanced data has obtained increasing attention ac... more In recent years, the classification of class-imbalanced data has obtained increasing attention across different scientific areas such as fraud detection, metabolomics, Cancer diagnosis, etc. This interest comes after proving the negative effect of overlapping on the performance of class-imbalanced learning. Based on augmented R-value, our proposed strategy aims to select features that make data achieve the minimal overlap degree, so improving the performance of classification as well. In this context, we present three feature selection algorithms RONS (Reduce Overlapping with No-sampling), ROS (Reduce Overlapping with SMOTE), and ROA (Reduce Overlapping with ADASYN), which are built through sparse feature selection to minimize the overlapping and perform binary classification. Also, a re-sampling process has been included in both ROS and ROA. Simulation results show that our proposed algorithms as feature selection methods manage the variation of false discovery rate during the selection of main features for the process modeling. For the experiment, four credit card datasets have been selected to test the performance of our algorithms. Using F-measure and Gmean evaluation metrics, the results reveal that our proposed algorithms are considerably recommended compared with classical feature selection methods. Besides, this effective feature selection strategy can be extended as an alternative to deal with class-imbalanced learning problems that involve overlapping.
Artificial intelligence (AI) techniques in general and convolutional neural networks (CNNs) in pa... more Artificial intelligence (AI) techniques in general and convolutional neural networks (CNNs) in particular have attained successful results in medical image analysis and classification. A deep CNN architecture has been proposed in this paper for the diagnosis of COVID-19 based on the chest X-ray image classification. Due to the nonavailability of sufficient-size and good-quality chest X-ray image dataset, an effective and accurate CNN classification was a challenge. To deal with these complexities such as the availability of a very-small-sized and imbalanced dataset with image-quality issues, the dataset has been preprocessed in different phases using different techniques to achieve an effective training dataset for the proposed CNN model to attain its best performance. The preprocessing stages of the datasets performed in this study include dataset balancing, medical experts’ image analysis, and data augmentation. The experimental results have shown the overall accuracy as high as 9...
App stores usually allow users to give reviews and ratings that are used by developers to resolve... more App stores usually allow users to give reviews and ratings that are used by developers to resolve issues and make plans for their apps. In this way, these app stores collect large amounts of data for analysis. However, there are several challenges that must first be addressed, related to redundancy and the volume of data, by using machine learning. This study performs experiments on a dataset that contains reviews for Shopify apps. To overcome the aforementioned limitations, we first categorize user reviews into two groups, i.e., happy and unhappy, and then perform preprocessing on the reviews to clean the data. At a later stage, several feature engineering techniques, such as bag-of-words, term frequency-inverse document frequency (TF-IDF), and chi-square (Chi2), are used singly and in combination to preserve meaningful information. Finally, the random forest, AdaBoost classifier, and logistic regression models are used to classify the reviews as happy or unhappy. The performance of our proposed pipeline was evaluated using average accuracy, precision, recall, and f 1 score. The experiments reveal that a combination of features can improve machine learning models performance and in this study, logistic regression outperforms the others and achieves an 83% true acceptance rate when combined with TF-IDF and Chi2.
The use of data from social networks such as Twitter has been increased during the last few years... more The use of data from social networks such as Twitter has been increased during the last few years to improve political campaigns, quality of products and services, sentiment analysis, etc. Tweets classification based on user sentiments is a collaborative and important task for many organizations. This paper proposes a voting classifier (VC) to help sentiment analysis for such organizations. The VC is based on logistic regression (LR) and stochastic gradient descent classifier (SGDC) and uses a soft voting mechanism to make the final prediction. Tweets were classified into positive, negative and neutral classes based on the sentiments they contain. In addition, a variety of machine learning classifiers were evaluated using accuracy, precision, recall and F1 score as the performance metrics. The impact of feature extraction techniques, including term frequency (TF), term frequency-inverse document frequency (TF-IDF), and word2vec, on classification accuracy was investigated as well. M...
Blood cancer has been a growing concern during the last decade and requires early diagnosis to st... more Blood cancer has been a growing concern during the last decade and requires early diagnosis to start proper treatment. The diagnosis process is costly and time-consuming involving medical experts and several tests. Thus, an automatic diagnosis system for its accurate prediction is of significant importance. Diagnosis of blood cancer using leukemia microarray gene data and machine learning approach has become an important medical research today. Despite research efforts, desired accuracy and efficiency necessitate further enhancements. This study proposes an approach for blood cancer disease prediction using the supervised machine learning approach. For the current study, the leukemia microarray gene dataset containing 22,283 genes, is used. ADASYN resampling and Chi-squared (Chi2) features selection techniques are used to resolve imbalanced and high-dimensional dataset problems. ADASYN generates artificial data to make the dataset balanced for each target class, and Chi2 selects the...
Rapid urbanization to meet the needs of the growing population has led to several challenges such... more Rapid urbanization to meet the needs of the growing population has led to several challenges such as pollution, increased and congested traffic, poor sustainability, and impact on the ecological environment. The conception of smart cities comprising intelligent convergence systems has been regarded as a potential solution to overcome these problems. Based on the information, communications, and technology (ICT), the idea of a smart city has emerged to decrease the impact of rapid urbanization. In this context, important efforts have been made for making cities smarter and more sustainable. However, the challenges associated with the implementation and evaluation of smart cities in developing countries are not examined appropriately, particularly in the Moroccan context. To analyze the efficacy and success of such efforts, the evaluation and comparisons using common frameworks are significantly important. For this purpose, the present research aims to investigate and evaluate the mos...
The spread of altered media in the form of fake videos, audios, and images, has been largely incr... more The spread of altered media in the form of fake videos, audios, and images, has been largely increased over the past few years. Advanced digital manipulation tools and techniques make it easier to generate fake content and post it on social media. In addition, tweets with deep fake content make their way to social platforms. The polarity of such tweets is significant to determine the sentiment of people about deep fakes. This paper presents a deep learning model to predict the polarity of deep fake tweets. For this purpose, a stacked bi-directional long short-term memory (SBi-LSTM) network is proposed to classify the sentiment of deep fake tweets. Several well-known machine learning classifiers are investigated as well such as support vector machine, logistic regression, Gaussian Naive Bayes, extra tree classifier, and AdaBoost classifier. These classifiers are utilized with term frequency-inverse document frequency and a bag of words feature extraction approaches. Besides, the perf...
Social media platforms and microblogging websites have gained accelerated popularity during the p... more Social media platforms and microblogging websites have gained accelerated popularity during the past few years. These platforms are used for expressing views and opinions about products, personalities, and events. Often during discussions and debates, fights take place on social media platforms which involves using rude, disrespectful, and hateful comments called toxic comments. The identification of toxic comments has been regarded as an essential element for social media platforms. This study introduces an ensemble approach, called regression vector voting classifier (RVVC), to identify the toxic comments on social media platforms. The ensemble merges the logistic regression and support vector classifier under soft voting criteria. Several experiments are performed on the imbalanced and balanced dataset to analyze the performance of the proposed approach. For data balance, the synthetic minority oversampling technique (SMOTE) is used on the imbalanced dataset. Furthermore, two feature extraction approaches are utilized to investigate their suitability such as term frequency-inverse document frequency (TF-IDF) and bag-of-words (BoW). The performance of the proposed approach is compared with several machine learning classifiers using accuracy, precision, recall, and F1-score. Results suggest that RVVC outperforms all other individual models when TF-IDF features are used with SMOTE balanced dataset and achieves an accuracy of 0.97.
Regular inspection of railway track health is crucial for maintaining safe and reliable train ope... more Regular inspection of railway track health is crucial for maintaining safe and reliable train operations. Factors, such as cracks, ballast issues, rail discontinuity, loose nuts and bolts, burnt wheels, superelevation, and misalignment developed on the rails due to non-maintenance, pre-emptive investigations and delayed detection, pose a grave danger and threats to the safe operation of rail transport. The traditional procedure of manually inspecting the rail track using a railway cart is both inefficient and prone to human error and biases. In a country like Pakistan where train accidents have taken many lives, it is not unusual to automate such approaches to avoid such accidents and save countless lives. This study aims at enhancing the traditional railway cart system to address these issues by introducing an automatic railway track fault detection system using acoustic analysis. In this regard, this study makes two important contributions: data collection on Pakistan railway trac...
Wireless capsule endoscopy (WCE) is an efficient tool to investigate gastrointestinal tract disor... more Wireless capsule endoscopy (WCE) is an efficient tool to investigate gastrointestinal tract disorders and perform painless imaging of the intestine. Despite that, several concerns make its wide applicability and adaptation challenging like efficacy, tolerance, safety, and performance. Besides, automatic analysis of the WCE provided dataset is of great importance for detecting abnormalities. Imaging of the patient's digestive tract through WCE produces a large dataset that requires a substantial amount of time and a special skill set from a medical practitioner for analysis. Several computer-aided and vision-based solutions have been proposed to resolve these issues, yet, they do not provide the desired level of accuracy and further improvements are still needed. The current study aims to devise a system that can perform the task of automatic analysis of WCE images to identify abnormalities and assist practitioners for robust diagnosis. This study adopts a deep neural network approach and proposes a model name BIR (bleedy image recognizer) that combines the MobileNet with a custom-built convolutional neural network (CNN) model to classify WCE bleedy images. BIR uses the MobileNet model for initial-level computation for its lower computation power requirement and subsequently the output is fed to the CNN for further processing. A dataset of 1650 WCE images is used to train and test the model using the measures of accuracy, precision, recall, F1 score, and Cohen's kappa to evaluate the performance of the BIR. Results indicate the promising outcomes with achieved accuracy, precision, recall, F1 score, and Cohen's kappa of 0.993, 1.000, 0.994, 0.997, and 0.995 respectively. The accuracy of the BIR model is 0.978 with the Google collected WCE image dataset which is better than the state-of-art approaches. INDEX TERMS Wireless capsule endoscopy, deep learning, computer vision, gastrointestinal tract infection, classification, convolutional neural networks.
Sarcasm emerges as a common phenomenon across social networking sites because people express thei... more Sarcasm emerges as a common phenomenon across social networking sites because people express their negative thoughts, hatred and opinions using positive vocabulary which makes it a challenging task to detect sarcasm. Although various studies have investigated the sarcasm detection on baseline datasets, this work is the first to detect sarcasm from a multi-domain dataset that is constructed by combining Twitter and News Headlines datasets. This study proposes a hybrid approach where the convolutional neural networks (CNN) are used for feature extraction while the long short-term memory (LSTM) is trained and tested on those features. For performance analysis, several machine learning algorithms such as random forest, support vector classifier, extra tree classifier and decision tree are used. The performance of both the proposed model and machine learning algorithms is analyzed using the term frequency-inverse document frequency, bag of words approach, and global vectors for word repr...
The spread of Covid-19 has resulted in worldwide health concerns. Social media is increasingly us... more The spread of Covid-19 has resulted in worldwide health concerns. Social media is increasingly used to share news and opinions about it. A realistic assessment of the situation is necessary to utilize resources optimally and appropriately. In this research, we perform Covid-19 tweets sentiment analysis using a supervised machine learning approach. Identification of Covid-19 sentiments from tweets would allow informed decisions for better handling the current pandemic situation. The used dataset is extracted from Twitter using IDs as provided by the IEEE data port. Tweets are extracted by an in-house built crawler that uses the Tweepy library. The dataset is cleaned using the preprocessing techniques and sentiments are extracted using the TextBlob library. The contribution of this work is the performance evaluation of various machine learning classifiers using our proposed feature set. This set is formed by concatenating the bag-of-words and the term frequency-inverse document freque...
Machine learning (ML) based forecasting mechanisms have proved their significance to anticipate i... more Machine learning (ML) based forecasting mechanisms have proved their significance to anticipate in perioperative outcomes to improve the decision making on the future course of actions. The ML models have long been used in many application domains which needed the identification and prioritization of adverse factors for a threat. Several prediction methods are being popularly used to handle forecasting problems. This study demonstrates the capability of ML models to forecast the number of upcoming patients affected by COVID-19 which is presently considered as a potential threat to mankind. In particular, four standard forecasting models, such as linear regression (LR), least absolute shrinkage and selection operator (LASSO), support vector machine (SVM), and exponential smoothing (ES) have been used in this study to forecast the threatening factors of COVID-19. Three types of predictions are made by each of the models, such as the number of newly infected cases, the number of deaths, and the number of recoveries in the next 10 days. The results produced by the study proves it a promising mechanism to use these methods for the current scenario of the COVID-19 pandemic. The results prove that the ES performs best among all the used models followed by LR and LASSO which performs well in forecasting the new confirmed cases, death rate as well as recovery rate, while SVM performs poorly in all the prediction scenarios given the available dataset. INDEX TERMS COVID-19, exponential smoothing method, future forecasting, adjusted R 2 score, supervised machine learning.
Amid the worldwide COVID-19 pandemic lockdowns, the closure of educational institutes leads to an... more Amid the worldwide COVID-19 pandemic lockdowns, the closure of educational institutes leads to an unprecedented rise in online learning. For limiting the impact of COVID-19 and obstructing its widespread, educational institutions closed their campuses immediately and academic activities are moved to e-learning platforms. The effectiveness of e-learning is a critical concern for both students and parents, specifically in terms of its suitability to students and teachers and its technical feasibility with respect to different social scenarios. Such concerns must be reviewed from several aspects before e-learning can be adopted at such a larger scale. This study endeavors to investigate the effectiveness of e-learning by analyzing the sentiments of people about e-learning. Due to the rise of social media as an important mode of communication recently, people’s views can be found on platforms such as Twitter, Instagram, Facebook, etc. This study uses a Twitter dataset containing 17,155 ...
The recent development of machines exhibiting intelligent characteristics involves numerous techn... more The recent development of machines exhibiting intelligent characteristics involves numerous techniques including computer hardware and software architecture development. Many different hardware devices, wearable sensors, machine learning, and deep learning model implementations are being applied in human activity recognition (HAR) applications in recent times. However, to develop high accuracy classification systems for activity recognition using ilow-cost hardware technology is of significant importance. To achieve this goal this study uses sensor data from two low-cost sensors, gyroscope and accelerometer along with the implementation of an Artificial Neural Network (ANN) based deep learning model for HAR. In particular, Deep Stacked Multilayered Perceptron (DS-MLP) has been proposed. In the implementation of DS-MLP, an ANN model has been used as a meta-learner while five MLP models have been used as base-learners. In this study, these base-learners and meta-learner have been combined using a stack ensemble technique. The performance evaluations have been done first on the applicability of individual base-models followed by the application of DS-MLP, the results prove the high accuracy of 97.3% and 99.4% for heterogeneous datasets used for testing. The performance of the proposed DS-MLP models has been compared to some existing machine learning classifiers and several state-of-the-art activity recognition systems. The comparative result analysis also proves that the proposed system performed better than these classification approaches in terms of important performance metrics such as accuracy, precision, recall, Fscore, Cohen's Kappa, and Mathew correlation coefficient.
In recent years, the classification of class-imbalanced data has obtained increasing attention ac... more In recent years, the classification of class-imbalanced data has obtained increasing attention across different scientific areas such as fraud detection, metabolomics, Cancer diagnosis, etc. This interest comes after proving the negative effect of overlapping on the performance of class-imbalanced learning. Based on augmented R-value, our proposed strategy aims to select features that make data achieve the minimal overlap degree, so improving the performance of classification as well. In this context, we present three feature selection algorithms RONS (Reduce Overlapping with No-sampling), ROS (Reduce Overlapping with SMOTE), and ROA (Reduce Overlapping with ADASYN), which are built through sparse feature selection to minimize the overlapping and perform binary classification. Also, a re-sampling process has been included in both ROS and ROA. Simulation results show that our proposed algorithms as feature selection methods manage the variation of false discovery rate during the selection of main features for the process modeling. For the experiment, four credit card datasets have been selected to test the performance of our algorithms. Using F-measure and Gmean evaluation metrics, the results reveal that our proposed algorithms are considerably recommended compared with classical feature selection methods. Besides, this effective feature selection strategy can be extended as an alternative to deal with class-imbalanced learning problems that involve overlapping.
Artificial intelligence (AI) techniques in general and convolutional neural networks (CNNs) in pa... more Artificial intelligence (AI) techniques in general and convolutional neural networks (CNNs) in particular have attained successful results in medical image analysis and classification. A deep CNN architecture has been proposed in this paper for the diagnosis of COVID-19 based on the chest X-ray image classification. Due to the nonavailability of sufficient-size and good-quality chest X-ray image dataset, an effective and accurate CNN classification was a challenge. To deal with these complexities such as the availability of a very-small-sized and imbalanced dataset with image-quality issues, the dataset has been preprocessed in different phases using different techniques to achieve an effective training dataset for the proposed CNN model to attain its best performance. The preprocessing stages of the datasets performed in this study include dataset balancing, medical experts’ image analysis, and data augmentation. The experimental results have shown the overall accuracy as high as 9...
App stores usually allow users to give reviews and ratings that are used by developers to resolve... more App stores usually allow users to give reviews and ratings that are used by developers to resolve issues and make plans for their apps. In this way, these app stores collect large amounts of data for analysis. However, there are several challenges that must first be addressed, related to redundancy and the volume of data, by using machine learning. This study performs experiments on a dataset that contains reviews for Shopify apps. To overcome the aforementioned limitations, we first categorize user reviews into two groups, i.e., happy and unhappy, and then perform preprocessing on the reviews to clean the data. At a later stage, several feature engineering techniques, such as bag-of-words, term frequency-inverse document frequency (TF-IDF), and chi-square (Chi2), are used singly and in combination to preserve meaningful information. Finally, the random forest, AdaBoost classifier, and logistic regression models are used to classify the reviews as happy or unhappy. The performance of our proposed pipeline was evaluated using average accuracy, precision, recall, and f 1 score. The experiments reveal that a combination of features can improve machine learning models performance and in this study, logistic regression outperforms the others and achieves an 83% true acceptance rate when combined with TF-IDF and Chi2.
The use of data from social networks such as Twitter has been increased during the last few years... more The use of data from social networks such as Twitter has been increased during the last few years to improve political campaigns, quality of products and services, sentiment analysis, etc. Tweets classification based on user sentiments is a collaborative and important task for many organizations. This paper proposes a voting classifier (VC) to help sentiment analysis for such organizations. The VC is based on logistic regression (LR) and stochastic gradient descent classifier (SGDC) and uses a soft voting mechanism to make the final prediction. Tweets were classified into positive, negative and neutral classes based on the sentiments they contain. In addition, a variety of machine learning classifiers were evaluated using accuracy, precision, recall and F1 score as the performance metrics. The impact of feature extraction techniques, including term frequency (TF), term frequency-inverse document frequency (TF-IDF), and word2vec, on classification accuracy was investigated as well. M...
Uploads
Papers by Furqan Rustam