Skip to main content

Ali Mustafa Qamar

National University of Sciences & Technology (NUST), Computing, Faculty Member

Followers

106

Following

62

Co-authors

2

Public Views

Phone: 00923435127473
Address: SEECS, NUST, Sector H-12, Islamabad, Paksitan

less

Wouter Gerritsma

Vrije Universiteit Amsterdam

Rachel Ivy Clarke

Syracuse University

University of Cologne

Prof. Mona Nasr

Helwan University

Aswani Kumar Cherukuri

VIT University

Dominik Batorski

University of Warsaw

Armando Marques-Guedes

UNL - New University of Lisbon

Mehmet Fatih Amasyali

Yildiz Technical University

Università degli Studi di Bari

Viacheslav Kuleshov

Stockholm University

Interests

Uploads

Papers by Ali Mustafa Qamar

Feature Selection Optimization in Software Product Lines

IEEE Access

Feature modeling is a common approach for configuring and capturing commonalities and variations ... more Feature modeling is a common approach for configuring and capturing commonalities and variations among different Software Product Lines (SPL) products. This process is carried out by a set of SPL design teams, each working on a different configuration of the desired product. The integration of these configurations leads to inconsistencies in the final product design. The typical solution involves extensive deliberation and unnecessary resource usage, which makes SPL inconsistency resolution an expensive and unoptimized process. We present the first comprehensive evaluation of swarm intelligence (using Particle Swarm Optimization) to the problem of resolving inconsistencies in a configured integrated SPL product. We call it o-SPLIT (optimization-based Software Product LIne Tool) and validate o-SPLIT with standard ERP, SPLOT (Software Product Lines Online Tools), and BeTTy (BEnchmarking and TesTing on the analYsis) product configurations along with diverse feature set sizes. The results show that Particle Swarm Optimization can successfully optimize SPL product configurations. Finally, we implement o-SPLIT as a decision-support tool in a real, local SPL setting and acquire subjective feedback from SPL designers which shows that the teams are convinced of the usability and high-level decision support provided by o-SPLIT.

Improving Sentiment Analysis of Arabic Tweets by One-way ANOVA

Journal of King Saud University - Computer and Information Sciences

Advances and Trends in Real Time Visual Crowd Analysis

Sensors

Real time crowd analysis represents an active area of research within the computer vision communi... more Real time crowd analysis represents an active area of research within the computer vision community in general and scene analysis in particular. Over the last 10 years, various methods for crowd management in real time scenario have received immense attention due to large scale applications in people counting, public events management, disaster management, safety monitoring an so on. Although many sophisticated algorithms have been developed to address the task; crowd management in real time conditions is still a challenging problem being completely solved, particularly in wild and unconstrained conditions. In the proposed paper, we present a detailed review of crowd analysis and management, focusing on state-of-the-art methods for both controlled and unconstrained conditions. The paper illustrates both the advantages and disadvantages of state-of-the-art methods. The methods presented comprise the seminal research works on crowd management, and monitoring and then culminating state...

Corrigendum: Water Supply 20 (1), 28–45: Water quality monitoring: from conventional to emerging technologies, Umair Ahmed et al

Water Supply

A Blockchain-Based Architecture for Smart Healthcare System: A Case Study of Saudi Arabia

Advances in Science, Technology and Engineering Systems Journal

eHealth is the use of Information and Communication Technologies (ICT) to enhance health quality ... more eHealth is the use of Information and Communication Technologies (ICT) to enhance health quality access. Health care innovations are an essential element of Vision 2030 and National Transformation Program (NTP) 2020 operational plan that will lead to an improvement in the quality of health care in Saudi Arabia. The objective of this paper is to address the healthcare system limitations in the Kingdom of Saudi Arabia in general and the Qassim region in particular to global standards. In practice, the creation of an infrastructure for storing and seamless sharing of health data between different entities is studied. Furthermore, an in-depth analysis of current practices w.r.t. the health data, as well as finding similarities between NTP 2020 and Health 2020 (European Union) has been performed. A multi-level blockchain eHealth system is proposed to provide seamless electronic health records to the patients. Moreover, the current practices being employed in Qassim province, KSA has been analyzed. It was found that private hospitals give access to medical reports to their patients besides allowing them to manage their appointments. On the other hand, access to the government hospitals' medical records is minimal.

A Secure Data Sharing Platform Using Blockchain and Interplanetary File System

Sustainability

In a research community, data sharing is an essential step to gain maximum knowledge from the pri... more In a research community, data sharing is an essential step to gain maximum knowledge from the prior work. Existing data sharing platforms depend on trusted third party (TTP). Due to the involvement of TTP, such systems lack trust, transparency, security, and immutability. To overcome these issues, this paper proposed a blockchain-based secure data sharing platform by leveraging the benefits of interplanetary file system (IPFS). A meta data is uploaded to IPFS server by owner and then divided into n secret shares. The proposed scheme achieves security and access control by executing the access roles written in smart contract by owner. Users are first authenticated through RSA signatures and then submit the requested amount as a price of digital content. After the successful delivery of data, the user is encouraged to register the reviews about data. These reviews are validated through Watson analyzer to filter out the fake reviews. The customers registering valid reviews are given in...

Quranic Reciter Recognition: A Machine Learning Approach

Advances in Science, Technology and Engineering Systems Journal

Recitation and listening of the Holy Quran with Tajweed is an essential activity as a Muslim and ... more Recitation and listening of the Holy Quran with Tajweed is an essential activity as a Muslim and is a part of the faith. In this article, we use a machine learning approach for the Quran Reciter recognition. We use the database of Twelve Qari who recites the last Ten Surah of Quran. The twelve Qari thus represents the 12-class problem. Two approaches are used for audio representation, firstly, the audio is analyzed in the frequency domain, and secondly, the audio is treated as images through Spectrogram. The Mel Frequency Cepstral Coefficients (MFCC) and Pitch are used as the features for model learning in the first case. In the second case of audio as images, Auto-correlograms are used to extract features. In both cases, the features are learned with the classical machine learning which includes the Naïve Bayes, J48, and the Random Forest. These classifiers are selected due to their overall good performance in the state-of-the-art. It is observed that classifiers can efficiently learn the separation between classes, when the audio is represented by the MFCC, and the Pitch features. In such a case, we get 88% recognition accuracy with the Naïve Bayes and the Random Forest showing that Qari can be effectively recognized from the recitation of the Quranic verses.

Similarity Learning in Nearest Neighbor and Application to Information Retrieval

Many people have tried to learn Mahanalobis distance metric in kNN classification by considering ... more Many people have tried to learn Mahanalobis distance metric in kNN classification by considering the geometry of the space containing examples. However, similarity may have an edge specially while dealing with text e.g. Information Retrieval. We have proposed an online algorithm, SiLA (Similarity learning algorithm) where the aim is to learn a similarity metric (e.g. cosine measure, Dice and Jaccard coefficients) and its variation eSiLA where we project the matrix learnt onto the cone of positive, semidefinite matrices. Two incremental algorithms have been developed; one based on standard kNN rule while the other one is its symmetric version. SiLA can be used in Information Retrieval where the performance can be improved by using user feedback.

Water quality monitoring: from conventional to emerging technologies

Water Supply

The rapid urbanization and industrial development have resulted in water contamination and water ... more The rapid urbanization and industrial development have resulted in water contamination and water quality deterioration at an alarming rate, deeming its quick, inexpensive and accurate detection imperative. Conventional methods to measure water quality are lengthy, expensive and inefficient, including the manual analysis process carried out in a lab. The research work in this paper focuses on the problem from various perspectives, including the traditional methods of determining water quality to gain insight into the problem and the analysis of state-of-the-art technologies, including Internet of Things (IoT) and machine learning techniques to address water quality. After analyzing the currently available solutions, this paper proposes an IoT-based low-cost system employing machine learning techniques to monitor water quality in real-time, analyze water quality trends and detect anomalous events such as intentional contamination of water.

TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data

Frontiers of Information Technology & Electronic Engineering

Taxonomy is generated to effectively organize and access data that is large in volume, as taxonom... more Taxonomy is generated to effectively organize and access data that is large in volume, as taxonomy is a way of representing concepts that exist in data. It needs to be evolved to reflect changes occur continuously in data. Existing automatic taxonomy generation techniques do not handle the evolution of data, therefore their generated taxonomies do not truly represent the data. The evolution of data can be handled either by regenerating taxonomy from scratch, or incrementally evolving taxonomy whenever changes occur in the data. The former approach is not economical subject to time and resources. Taxonomy incremental evolution (TIE) algorithm, proposed in this paper, is a novel attempt to handle an evolving data. It serves as a layer over an existing clustering-based taxonomy generation technique and incrementally evolves an existing taxonomy. The algorithm was evaluated on scholarly articles selected from computing domain. It was found that the algorithm evolves taxonomy in a considerably shorter period of time, having better quality per unit time as compared to the taxonomy regenerated from scratch.

Classification and legality analysis of bowling action in the game of cricket

Data Mining and Knowledge Discovery

Sentiment classification of tweets using hierarchical classification

2016 IEEE International Conference on Communications (ICC), 2016

This paper addresses the problem of sentiment classification of short messages on microblogging p... more This paper addresses the problem of sentiment classification of short messages on microblogging platforms. We apply machine learning and pattern recognition techniques to design and implement a classification system for microblog messages assigning them into one of three classes: positive, negative or neutral. As part of this work, we contributed a dataset consisting of approximately 10,000 tweets, each labeled on a five point sentiment scale by three different people. Experiments demonstrate a detection rate between approximately 70% and an average false alarm rate of approximately 18% across all three classes. The developed classifier has been made available for online use.

Working Notes for the InFile Campaign : Online Document Filtering Using 1 Nearest Neighbor

This paper has been written as a part of the InFile (IN-Formation, FILtering, Evaluation) campaig... more This paper has been written as a part of the InFile (IN-Formation, FILtering, Evaluation) campaign. This project is a crosslanguage adaptive filtering evaluation campaign, sponsored by the French national research agency, and it is a pilot track of the CLEF (Cross Language Evaluation Forum) 2008 campaigns. We propose in this paper an online algorithm to learn category specific thresholds in a multiclass environment where a document can belong to more than one class. Our method uses 1 Nearest Neighbor (1NN) algorithm for classification. It uses simulated user feedback to fine tune the threshold and in turn the classification performance over time. The experiments were run on English language corpus containing 100,000 documents. The best results have a precision of 0.366 and the recall is 0.260.

Generalized Cosine and Similarity Metrics: A Supervised Learning Approach based on Nearest Neighbors

Data analysis, quality indexing and prediction of water quality for the management of rawal watershed in Pakistan

Eighth International Conference on Digital Information Management (ICDIM 2013), 2013

In contrast to managing the water quality only at the command level (where water is being consume... more In contrast to managing the water quality only at the command level (where water is being consumed), one should also give importance to the water quality in the areas where water is being produced i.e. the watersheds. The failure to do so deteriorates the water quality for down streams and poses serious challenges for the water managers in order to meet the water quality requirements on sustainable basis. In order to have an effective water management in command areas, it is essential to assess different aspects of water quality. Rawal watershed is a relatively small watershed area which is being affected by the anthropogenic activities e.g. urbanization, deforestation etc. In this paper, we present the last four years (2009 − 2012) trends of water quality related parameters along with month-wise as well as source-wise parametric satisfactory analysis against WHO quality standards. Moreover, we applied regression models to check the seasonal water quality trends. The quality indices were analyzed by the combination of supervised and unsupervised machine learning techniques. Different sources of fecal coliforms contamination were also identified. Lastly the possible reasons for high contamination were identified by studying the watershed land covers. Our research suggests that in order to find the quality index of water, Average Linkage (Within Groups) method of Hierarchical Clustering using Euclidean distance is an accurate unsupervised learning technique. Similarly, for classifications, Multi-Layer Perceptron (MLP) has been found to be more accurate supervised learning technique. Higher values of fecal coliforms were found in the months of March, June, July, and October. Some of the possible reasons are land-covers especially scrub forest and rain-fed agriculture areas, poultry farms, and population settled around the streams.

A semantic rules & reasoning based approach for Diet and Exercise management for diabetics

Predicting New Collaborations in Academic Citation Networks of IEEE and ACM Conferences

In this paper we study the time evolution of academic collaboration networks by predicting the ap... more In this paper we study the time evolution of academic collaboration networks by predicting the appearance of new links between authors. The accurate prediction of new collaborations between members of a collaboration network can help accelerate the realization of new synergies, foster innovation, and raise productivity. For this study, the authors collected a large data set of publications from 630 conferences of the IEEE and ACM of more than 257, 000 authors, 61, 000 papers, capturing more than 818, 000 collaborations spanning a period of 10 years. The data set is rich in semantic data that allows exploration of many features that were not considered in previous approaches. We considered a comprehensive set of 98 features, and after processing identified eight features as significant. Most significantly, we identified two new features as most significant predictors of future collaborations; 1) the number of common title words, and 2) number of common references in two authors' papers. The link prediction problem is formulated as a binary classification problem, and three different supervised learning algorithms are evaluated, i.e. Naïve Bayes, C4.5 decision tree and Support Vector Machines. Extensive efforts are made to ensure complete spatial isolation of information used in training and test instances, which to the authors' best knowledge is unprecedented. Results were validated using a modified form of the classic 10-fold cross validation (the change was necessitated by the way training, and test instances were separated). The Support Vector Machine classifier performed the best among tested approaches, and correctly classified on average more than 80% of test instances and had a receiver operating curve (ROC) area of greater than 0.80.

Rot-SiLA: A Novel Ensemble Classification Approach Based on Rotation Forest and Similarity Learning Using Nearest Neighbor Algorithm

2013 12th International Conference on Machine Learning and Applications, 2013

Recent years have seen a great inclination towards Machine Learning classification and researcher... more Recent years have seen a great inclination towards Machine Learning classification and researchers are thinking in terms of achieving accuracy and correctness. Many studied have proved that an ensemble of classifiers outperform individual ones in terms of accuracy. Qamar et al. have developed a Similarity Learning Algorithm (SiLA) based on a combination of k nearest neighbor algorithm and Voted Perceptron. This approach is different from other state of the art algorithms in the sense that it learns appropriate similarity metrics rather than distancebased ones for all types of datasets i.e. textual as well as nontextual. In this paper, we present a novel ensemble classifier Rot-SiLA which is developed by combining Rotation Forest algorithm and SiLA. The Rot-SiLA ensemble classifier is built upon two types of approaches; one based on standard kNN and another based on symmetric kNN (SkNN), just as was the case with SiLA algorithm. It has been observed that Rot-SiLA ensemble outperforms other variants of the Rotation Forest ensemble as well as SiLA significantly when experiments were conducted with 14 UCI repository data sets. The significance of the results was determined by s-test.

Dynamic entity and relationship extraction from news articles

2012 International Conference on Emerging Technologies, 2012

Abstract In structured as well as unstructured data, information extraction (IE) and information ... more Abstract In structured as well as unstructured data, information extraction (IE) and information retrieval (IR) techniques are gaining popularity in order to produce a realistic output. The Internet users are growing day by day and becoming a popular source for spreading the information through news/blogs etc. To monitor this information, a lot of quality work has been done in that perspective. Related to news monitoring, our proposed unsupervised machine learning approach will fetch the entities and relationships from the ...

Similarity learning in nearest neighbor, positive semi-definitiveness and RELIEF algorithm

2010 International Conference of Soft Computing and Pattern Recognition, 2010

Abstract In this paper, we develop a similarity learning version of RELIEF algorithm, called RBS-... more Abstract In this paper, we develop a similarity learning version of RELIEF algorithm, called RBS-PSD (for RELIEF-Based Similarity learning) where the learned similarity matrix is projected onto the set of positive, semi-definite matrices. Unfortunately, this algorithm does not perform very well in practice since it does not try to optimize the leave-one-out error or the 0-1 loss. This motivated us to develop its stricter version, called sRBS-PSD, which aims at reducing a cost function closer to the 0-1 loss. In the case of sRBS-PSD also, the ...

Feature Selection Optimization in Software Product Lines

IEEE Access

Feature modeling is a common approach for configuring and capturing commonalities and variations ... more Feature modeling is a common approach for configuring and capturing commonalities and variations among different Software Product Lines (SPL) products. This process is carried out by a set of SPL design teams, each working on a different configuration of the desired product. The integration of these configurations leads to inconsistencies in the final product design. The typical solution involves extensive deliberation and unnecessary resource usage, which makes SPL inconsistency resolution an expensive and unoptimized process. We present the first comprehensive evaluation of swarm intelligence (using Particle Swarm Optimization) to the problem of resolving inconsistencies in a configured integrated SPL product. We call it o-SPLIT (optimization-based Software Product LIne Tool) and validate o-SPLIT with standard ERP, SPLOT (Software Product Lines Online Tools), and BeTTy (BEnchmarking and TesTing on the analYsis) product configurations along with diverse feature set sizes. The results show that Particle Swarm Optimization can successfully optimize SPL product configurations. Finally, we implement o-SPLIT as a decision-support tool in a real, local SPL setting and acquire subjective feedback from SPL designers which shows that the teams are convinced of the usability and high-level decision support provided by o-SPLIT.

Improving Sentiment Analysis of Arabic Tweets by One-way ANOVA

Journal of King Saud University - Computer and Information Sciences

Advances and Trends in Real Time Visual Crowd Analysis

Sensors

Real time crowd analysis represents an active area of research within the computer vision communi... more Real time crowd analysis represents an active area of research within the computer vision community in general and scene analysis in particular. Over the last 10 years, various methods for crowd management in real time scenario have received immense attention due to large scale applications in people counting, public events management, disaster management, safety monitoring an so on. Although many sophisticated algorithms have been developed to address the task; crowd management in real time conditions is still a challenging problem being completely solved, particularly in wild and unconstrained conditions. In the proposed paper, we present a detailed review of crowd analysis and management, focusing on state-of-the-art methods for both controlled and unconstrained conditions. The paper illustrates both the advantages and disadvantages of state-of-the-art methods. The methods presented comprise the seminal research works on crowd management, and monitoring and then culminating state...

Corrigendum: Water Supply 20 (1), 28–45: Water quality monitoring: from conventional to emerging technologies, Umair Ahmed et al

Water Supply

A Blockchain-Based Architecture for Smart Healthcare System: A Case Study of Saudi Arabia

Advances in Science, Technology and Engineering Systems Journal

eHealth is the use of Information and Communication Technologies (ICT) to enhance health quality ... more eHealth is the use of Information and Communication Technologies (ICT) to enhance health quality access. Health care innovations are an essential element of Vision 2030 and National Transformation Program (NTP) 2020 operational plan that will lead to an improvement in the quality of health care in Saudi Arabia. The objective of this paper is to address the healthcare system limitations in the Kingdom of Saudi Arabia in general and the Qassim region in particular to global standards. In practice, the creation of an infrastructure for storing and seamless sharing of health data between different entities is studied. Furthermore, an in-depth analysis of current practices w.r.t. the health data, as well as finding similarities between NTP 2020 and Health 2020 (European Union) has been performed. A multi-level blockchain eHealth system is proposed to provide seamless electronic health records to the patients. Moreover, the current practices being employed in Qassim province, KSA has been analyzed. It was found that private hospitals give access to medical reports to their patients besides allowing them to manage their appointments. On the other hand, access to the government hospitals' medical records is minimal.

A Secure Data Sharing Platform Using Blockchain and Interplanetary File System

Sustainability

In a research community, data sharing is an essential step to gain maximum knowledge from the pri... more In a research community, data sharing is an essential step to gain maximum knowledge from the prior work. Existing data sharing platforms depend on trusted third party (TTP). Due to the involvement of TTP, such systems lack trust, transparency, security, and immutability. To overcome these issues, this paper proposed a blockchain-based secure data sharing platform by leveraging the benefits of interplanetary file system (IPFS). A meta data is uploaded to IPFS server by owner and then divided into n secret shares. The proposed scheme achieves security and access control by executing the access roles written in smart contract by owner. Users are first authenticated through RSA signatures and then submit the requested amount as a price of digital content. After the successful delivery of data, the user is encouraged to register the reviews about data. These reviews are validated through Watson analyzer to filter out the fake reviews. The customers registering valid reviews are given in...

Quranic Reciter Recognition: A Machine Learning Approach

Advances in Science, Technology and Engineering Systems Journal

Recitation and listening of the Holy Quran with Tajweed is an essential activity as a Muslim and ... more Recitation and listening of the Holy Quran with Tajweed is an essential activity as a Muslim and is a part of the faith. In this article, we use a machine learning approach for the Quran Reciter recognition. We use the database of Twelve Qari who recites the last Ten Surah of Quran. The twelve Qari thus represents the 12-class problem. Two approaches are used for audio representation, firstly, the audio is analyzed in the frequency domain, and secondly, the audio is treated as images through Spectrogram. The Mel Frequency Cepstral Coefficients (MFCC) and Pitch are used as the features for model learning in the first case. In the second case of audio as images, Auto-correlograms are used to extract features. In both cases, the features are learned with the classical machine learning which includes the Naïve Bayes, J48, and the Random Forest. These classifiers are selected due to their overall good performance in the state-of-the-art. It is observed that classifiers can efficiently learn the separation between classes, when the audio is represented by the MFCC, and the Pitch features. In such a case, we get 88% recognition accuracy with the Naïve Bayes and the Random Forest showing that Qari can be effectively recognized from the recitation of the Quranic verses.

Similarity Learning in Nearest Neighbor and Application to Information Retrieval

Many people have tried to learn Mahanalobis distance metric in kNN classification by considering ... more Many people have tried to learn Mahanalobis distance metric in kNN classification by considering the geometry of the space containing examples. However, similarity may have an edge specially while dealing with text e.g. Information Retrieval. We have proposed an online algorithm, SiLA (Similarity learning algorithm) where the aim is to learn a similarity metric (e.g. cosine measure, Dice and Jaccard coefficients) and its variation eSiLA where we project the matrix learnt onto the cone of positive, semidefinite matrices. Two incremental algorithms have been developed; one based on standard kNN rule while the other one is its symmetric version. SiLA can be used in Information Retrieval where the performance can be improved by using user feedback.

Water quality monitoring: from conventional to emerging technologies

Water Supply

The rapid urbanization and industrial development have resulted in water contamination and water ... more The rapid urbanization and industrial development have resulted in water contamination and water quality deterioration at an alarming rate, deeming its quick, inexpensive and accurate detection imperative. Conventional methods to measure water quality are lengthy, expensive and inefficient, including the manual analysis process carried out in a lab. The research work in this paper focuses on the problem from various perspectives, including the traditional methods of determining water quality to gain insight into the problem and the analysis of state-of-the-art technologies, including Internet of Things (IoT) and machine learning techniques to address water quality. After analyzing the currently available solutions, this paper proposes an IoT-based low-cost system employing machine learning techniques to monitor water quality in real-time, analyze water quality trends and detect anomalous events such as intentional contamination of water.

TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data

Frontiers of Information Technology & Electronic Engineering

Taxonomy is generated to effectively organize and access data that is large in volume, as taxonom... more Taxonomy is generated to effectively organize and access data that is large in volume, as taxonomy is a way of representing concepts that exist in data. It needs to be evolved to reflect changes occur continuously in data. Existing automatic taxonomy generation techniques do not handle the evolution of data, therefore their generated taxonomies do not truly represent the data. The evolution of data can be handled either by regenerating taxonomy from scratch, or incrementally evolving taxonomy whenever changes occur in the data. The former approach is not economical subject to time and resources. Taxonomy incremental evolution (TIE) algorithm, proposed in this paper, is a novel attempt to handle an evolving data. It serves as a layer over an existing clustering-based taxonomy generation technique and incrementally evolves an existing taxonomy. The algorithm was evaluated on scholarly articles selected from computing domain. It was found that the algorithm evolves taxonomy in a considerably shorter period of time, having better quality per unit time as compared to the taxonomy regenerated from scratch.

Classification and legality analysis of bowling action in the game of cricket

Data Mining and Knowledge Discovery

Sentiment classification of tweets using hierarchical classification

2016 IEEE International Conference on Communications (ICC), 2016

This paper addresses the problem of sentiment classification of short messages on microblogging p... more This paper addresses the problem of sentiment classification of short messages on microblogging platforms. We apply machine learning and pattern recognition techniques to design and implement a classification system for microblog messages assigning them into one of three classes: positive, negative or neutral. As part of this work, we contributed a dataset consisting of approximately 10,000 tweets, each labeled on a five point sentiment scale by three different people. Experiments demonstrate a detection rate between approximately 70% and an average false alarm rate of approximately 18% across all three classes. The developed classifier has been made available for online use.

Working Notes for the InFile Campaign : Online Document Filtering Using 1 Nearest Neighbor

This paper has been written as a part of the InFile (IN-Formation, FILtering, Evaluation) campaig... more This paper has been written as a part of the InFile (IN-Formation, FILtering, Evaluation) campaign. This project is a crosslanguage adaptive filtering evaluation campaign, sponsored by the French national research agency, and it is a pilot track of the CLEF (Cross Language Evaluation Forum) 2008 campaigns. We propose in this paper an online algorithm to learn category specific thresholds in a multiclass environment where a document can belong to more than one class. Our method uses 1 Nearest Neighbor (1NN) algorithm for classification. It uses simulated user feedback to fine tune the threshold and in turn the classification performance over time. The experiments were run on English language corpus containing 100,000 documents. The best results have a precision of 0.366 and the recall is 0.260.

Generalized Cosine and Similarity Metrics: A Supervised Learning Approach based on Nearest Neighbors

Data analysis, quality indexing and prediction of water quality for the management of rawal watershed in Pakistan

Eighth International Conference on Digital Information Management (ICDIM 2013), 2013

In contrast to managing the water quality only at the command level (where water is being consume... more In contrast to managing the water quality only at the command level (where water is being consumed), one should also give importance to the water quality in the areas where water is being produced i.e. the watersheds. The failure to do so deteriorates the water quality for down streams and poses serious challenges for the water managers in order to meet the water quality requirements on sustainable basis. In order to have an effective water management in command areas, it is essential to assess different aspects of water quality. Rawal watershed is a relatively small watershed area which is being affected by the anthropogenic activities e.g. urbanization, deforestation etc. In this paper, we present the last four years (2009 − 2012) trends of water quality related parameters along with month-wise as well as source-wise parametric satisfactory analysis against WHO quality standards. Moreover, we applied regression models to check the seasonal water quality trends. The quality indices were analyzed by the combination of supervised and unsupervised machine learning techniques. Different sources of fecal coliforms contamination were also identified. Lastly the possible reasons for high contamination were identified by studying the watershed land covers. Our research suggests that in order to find the quality index of water, Average Linkage (Within Groups) method of Hierarchical Clustering using Euclidean distance is an accurate unsupervised learning technique. Similarly, for classifications, Multi-Layer Perceptron (MLP) has been found to be more accurate supervised learning technique. Higher values of fecal coliforms were found in the months of March, June, July, and October. Some of the possible reasons are land-covers especially scrub forest and rain-fed agriculture areas, poultry farms, and population settled around the streams.

A semantic rules & reasoning based approach for Diet and Exercise management for diabetics

Predicting New Collaborations in Academic Citation Networks of IEEE and ACM Conferences

In this paper we study the time evolution of academic collaboration networks by predicting the ap... more In this paper we study the time evolution of academic collaboration networks by predicting the appearance of new links between authors. The accurate prediction of new collaborations between members of a collaboration network can help accelerate the realization of new synergies, foster innovation, and raise productivity. For this study, the authors collected a large data set of publications from 630 conferences of the IEEE and ACM of more than 257, 000 authors, 61, 000 papers, capturing more than 818, 000 collaborations spanning a period of 10 years. The data set is rich in semantic data that allows exploration of many features that were not considered in previous approaches. We considered a comprehensive set of 98 features, and after processing identified eight features as significant. Most significantly, we identified two new features as most significant predictors of future collaborations; 1) the number of common title words, and 2) number of common references in two authors' papers. The link prediction problem is formulated as a binary classification problem, and three different supervised learning algorithms are evaluated, i.e. Naïve Bayes, C4.5 decision tree and Support Vector Machines. Extensive efforts are made to ensure complete spatial isolation of information used in training and test instances, which to the authors' best knowledge is unprecedented. Results were validated using a modified form of the classic 10-fold cross validation (the change was necessitated by the way training, and test instances were separated). The Support Vector Machine classifier performed the best among tested approaches, and correctly classified on average more than 80% of test instances and had a receiver operating curve (ROC) area of greater than 0.80.

Rot-SiLA: A Novel Ensemble Classification Approach Based on Rotation Forest and Similarity Learning Using Nearest Neighbor Algorithm

2013 12th International Conference on Machine Learning and Applications, 2013

Recent years have seen a great inclination towards Machine Learning classification and researcher... more Recent years have seen a great inclination towards Machine Learning classification and researchers are thinking in terms of achieving accuracy and correctness. Many studied have proved that an ensemble of classifiers outperform individual ones in terms of accuracy. Qamar et al. have developed a Similarity Learning Algorithm (SiLA) based on a combination of k nearest neighbor algorithm and Voted Perceptron. This approach is different from other state of the art algorithms in the sense that it learns appropriate similarity metrics rather than distancebased ones for all types of datasets i.e. textual as well as nontextual. In this paper, we present a novel ensemble classifier Rot-SiLA which is developed by combining Rotation Forest algorithm and SiLA. The Rot-SiLA ensemble classifier is built upon two types of approaches; one based on standard kNN and another based on symmetric kNN (SkNN), just as was the case with SiLA algorithm. It has been observed that Rot-SiLA ensemble outperforms other variants of the Rotation Forest ensemble as well as SiLA significantly when experiments were conducted with 14 UCI repository data sets. The significance of the results was determined by s-test.

Dynamic entity and relationship extraction from news articles

2012 International Conference on Emerging Technologies, 2012

Abstract In structured as well as unstructured data, information extraction (IE) and information ... more Abstract In structured as well as unstructured data, information extraction (IE) and information retrieval (IR) techniques are gaining popularity in order to produce a realistic output. The Internet users are growing day by day and becoming a popular source for spreading the information through news/blogs etc. To monitor this information, a lot of quality work has been done in that perspective. Related to news monitoring, our proposed unsupervised machine learning approach will fetch the entities and relationships from the ...

Similarity learning in nearest neighbor, positive semi-definitiveness and RELIEF algorithm

2010 International Conference of Soft Computing and Pattern Recognition, 2010

Abstract In this paper, we develop a similarity learning version of RELIEF algorithm, called RBS-... more Abstract In this paper, we develop a similarity learning version of RELIEF algorithm, called RBS-PSD (for RELIEF-Based Similarity learning) where the learned similarity matrix is projected onto the set of positive, semi-definite matrices. Unfortunately, this algorithm does not perform very well in practice since it does not try to optimize the leave-one-out error or the 0-1 loss. This motivated us to develop its stricter version, called sRBS-PSD, which aims at reducing a cost function closer to the 0-1 loss. In the case of sRBS-PSD also, the ...