Shobeir Fakhraei

University of Southern California, Information Sciences Institute, Faculty Member

University of Maryland, Computer Science, Graduate Student

Followers

200

Following

Public Views

William Dumouchel

Oracle

Vassilis Koutkias

Aristotle University of Thessaloniki

Helwan University

GlaxoSmithKline, LLP

University of Western Ontario

InterestsView All (6)

Uploads

Papers by Shobeir Fakhraei

Network-Based Drug-Target Interaction Prediction with Probabilistic Soft Logic

Drug-target interaction studies are important because they can predict drugs unexpected therapeut... more Drug-target interaction studies are important because they can predict drugs unexpected therapeutic or adverse side effects. In silico predictions of potential interactions are valuable and can focus effort on in vitro experiments. We propose a prediction framework that represents the problem using a bipartite graph of drug-target interactions augmented with drug-drug and target-target similarity measures and makes predictions using probabilistic soft logic (PSL). Using probabilistic rules in PSL, we predict interactions with models based on triad and tetrad structures. We apply (blocking) techniques that make link prediction in PSL more efficient for drug-target interaction prediction. We then perform extensive experimental studies to highlight different aspects of the model and the domain, first comparing the models with different structures and then measuring the effect of the proposed blocking on the prediction performance and efficiency. We demonstrate the importance of rule weight learning in the proposed PSL model and then show that PSL can effectively make use of a variety of similarity measures. We perform an experiment to validate the importance of collective inference and using multiple similarity measures for accurate predictions in contrast to non-collective and single similarity assumptions. Finally, we illustrate that our PSL model achieves state-of-the-art performance with simple, interpretable rules and evaluate our novel predictions using online datasets.

Download

Data Analytics for Pharmaceutical Discoveries

Book Chapter

Download

Predictable Dual-View Hashing

We propose a Predictable Dual-View Hashing (PDH) algorithm which embeds proximity of data samples... more We propose a Predictable Dual-View Hashing (PDH) algorithm which embeds proximity of data samples in the original spaces. We create a cross-view hamming space with the ability to compare information from previously incomparable domains with a notion of 'predictability'. By performing comparative experimental analysis on two large datasets, PASCAL-Sentence and SUN-Attribute, we demonstrate the superiority of our method to the state-of-the-art dual-view binary code learning algorithms.

Download

Bias and Stability of Single Variable Classifiers

Feature rankings are often used for supervised dimension reduction especially when discriminating... more Feature rankings are often used for supervised dimension reduction especially when discriminating power of each
feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more
complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature
rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking
based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities
of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results
of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers
influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final
rankings. We show the common intuition of using the same classifier for feature ranking and final classification does
not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches
provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an
empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification
from the optimal choices.

Download

Confident Surgical Decision Making in Temporal Lobe Epilepsy by Heterogeneous Classifier Ensembles

In medical domains with low tolerance for invalid predictions, classification confidence is highl... more In medical domains with low tolerance for invalid predictions, classification confidence is highly important and traditional performance measures such as overall accuracy cannot provide adequate insight into classifications reliability. In this paper, a confident-prediction rate (CPR) which measures the upper limit of confident predictions has been proposed based on receiver operating characteristic (ROC) curves. It has been shown that heterogeneous ensemble of classifiers improves this measure. This ensemble approach has been applied to lateralization of focal epileptogenicity in temporal lobe epilepsy (TLE) and prediction of surgical outcomes. A goal of this study is to reduce extraoperative electrocorticography (eECoG) requirement which is the practice of using electrodes placed directly on the exposed surface of the brain. We have shown that such goal is achievable with application of data mining techniques. Furthermore, all TLE surgical operations do not result in complete relief from seizures and it is not always possible for human experts to identify such unsuccessful cases prior to surgery. This study demonstrates the capability of data mining techniques in prediction of undesirable outcome for a portion of such cases.

Download

Confident Surgical Decision Making in Temporal Lobe Epilepsy by Heterogeneous Classifier Ensembles

Abstract In medical domains with low tolerance for invalid predictions, classification confidence... more Abstract In medical domains with low tolerance for invalid predictions, classification confidence is highly important and traditional performance measures such as overall accuracy cannot provide adequate insight into classifications reliability. In this paper, a confident-prediction rate (CPR) which measures the upper limit of confident predictions has been proposed based on receiver operating characteristic (ROC) curves. It has been shown that heterogeneous ensemble of classifiers improves this measure.

Download

Confidence in medical decision making: application in temporal lobe epilepsy data mining

Prior to neurosurgical resection of abnormal brain tissues in mTLE patients, focal points of the ... more Prior to neurosurgical resection of abnormal brain tissues in mTLE patients, focal points of the seizure should be identified via a set of examinations. Once decisive evidence is not present in noninvasive clinical profile of mTLE patients, extraoperative Electrocorticography (ECoG) is required which is the practice of using electrodes placed directly on the exposed surface of the brain. Through classification techniques on a dataset of mTLE patients, we have studied the possibility of reduction of such requirement and shown significant results. Furthermore, we compared the performance of six well known classifiers using the area under receiver operating characteristic (ROC) curve (AUC) and a proposed measure of decision confidence. We have shown that in critical domains such as medicine, use of AUC does not provide sufficient information about the confidence of the classification and further measures are needed.

Consensus Feature Ranking in Datasets with Missing Values

Development of a feature ranking method based upon the discriminative power of features and unbia... more Development of a feature ranking method based upon the discriminative power of features and unbiased towards classifiers is of interest. We have studied a consensus feature ranking method, based on multiple classifiers, and have shown its superiority to well known statistical ranking methods. In a target environment such as a medical dataset, missing values and an unbalanced distribution of data must be taken into consideration in the ranking and evaluation phases in order to legitimately apply a feature ranking method. In a comparison study, a Performance Index (PI) is proposed that takes into account both the number of features and the number of samples involved in the classification.

Attribute Ranking for Lateralizing Focal Epileptogenicity in Temporal Lobe Epilepsy

A consensus feature-ranking approach has been applied to the study of localization-related tempor... more A consensus feature-ranking approach has been applied to the study of localization-related temporal lobe epilepsy (TLE) in order to evaluate the relative discriminative power of individual attributes. Cases were selected on the basis of a postoperative outcome free of disabling seizures (i.e., Engel class I) in order to establish a definitive laterality of focal epileptogenicity. Several quantitative measures made available by imaging and electrographic studies are considered and the most discriminative of these are quantitatively prioritized for the lateralization of focal epileptogenicity. Cases requiring extraoperative electrocorticography were examined as a subgroup to establish whether the current method of analysis could distinguish laterality sufficiently well to avoid the requirement for intracranial electrode implantation.

Effect of classifiers in consensus feature ranking for biomedical datasets

Many informative aspects of medical datasets may be extracted from comparative study of features ... more Many informative aspects of medical datasets may be extracted from comparative study of features discriminative power. Recently, consensus feature rankings have been proposed to achieve robust, unbiased and reliable rankings of attributes. We have studied the effect of classifier inclusion in a consensus feature ranking method for a medical dataset with missing values and class imbalanced data. Ability of consensus feature rankings to demonstrate superior performance with unseen classifiers is also studied in this paper.

Aspect Extraction from Software Design Model

Aspect-Oriented programming was introduced by Gregor Kiczales in 1997 to handle concerns that cou... more Aspect-Oriented programming was introduced by Gregor Kiczales in 1997 to handle concerns that could not be fully separated via Object-Oriented programming, which are called crosscutting concerns. Traditionally, aspect-oriented software development has focused on the software life cycle’s implementation phase: developers identify and capture aspects mainly in code. But aspects are evident earlier in the life cycle, such as during requirements engineering and design level.
In this paper issues on different approaches for handling crosscutting concerns in requirement and design level were discussed. A practical process for identification and extraction of aspects in software design model was proposed. The process starts by checking the completeness of the model and adds non-functional requirements to UML’s use case model and validates the model’s relationships. Crosscutting concerns are identified in the design model during the next steps and then behavioral specifications of the model are analyzed with aspect identification perspective. Finally a formula for comparing different criteria based on WMC was proposed.
Keywords
Aspect Oriented, Early Aspect, Aspect Mining, Design Model, UML, Crosscutting Concern, Process

Network-Based Drug-Target Interaction Prediction with Probabilistic Soft Logic

Download

Data Analytics for Pharmaceutical Discoveries

Book Chapter

Download

Predictable Dual-View Hashing

Download

Bias and Stability of Single Variable Classifiers