Papers by Maseud Rahgozar

Nashriyyahʼi farhangʼi Khurāsān, Dec 16, 2023
Nowadays, online social networks have a great impact on people's life and how they interact. News... more Nowadays, online social networks have a great impact on people's life and how they interact. News, sentiment, rumors, and fashion, like contagious diseases, are propagated through online social networks. When information is transmitted from one person to another in a social network, a diffusion process occurs. Each node of a network that participates in the diffusion process leaves some effects on this process, such as its transmission time. In most cases, despite the visibility of such effects of diffusion process, the structure of the network is unknown. Knowing the structure of a social network is essential for many research studies such as: such as community detection, expert finding, influence maximization, information diffusion, sentiment propagation, immunization against rumors, etc. Hence, inferring diffusion network and studying the behavior of the inferred network are considered to be important issues in social network researches. In recent years, various methods have been proposed for inferring a diffusion network. A wide range of proposed models, named parametric models, assume that the pattern of the propagation process follows a particular distribution. What's happening in the real world is very complicated and cannot easily be modeled with parametric models. Also, the models provided for large volumes of data do not have the required performance due to their high execution time. However, in this article, a nonparametric model is proposed that infers the underlying diffusion network. In the proposed model, all potential edges between the network nodes are identified using a similarity-based link prediction method. Then, a fast algorithm for graph pruning is used to reduce the number of edges. The proposed algorithm uses the transitive influence principle in social networks. The time complexity order of the proposed method is O(n 3 ). This method was evaluated for both synthesized and real datasets. Comparison of the proposed method with state-of-the-art on different network types and various models of information cascades show that the model performs better precision and decreases the execution time too.
International Journal of Computer Sciences and Engineering, 2018

DMIN, 2008
It has been found that, if a (negative) bias is applied to a substrate during the sputtering ther... more It has been found that, if a (negative) bias is applied to a substrate during the sputtering thereto of Alfesil, selective re-sputtering from the substrate film of aluminum and silicon will leave that film rich in iron and, attendantly, of higher saturation magnetization (17,000 gauss) than the starting material Alfesil (10,000 gauss). Such being the case, the invention provides that the sputtering of Alfesil-type material during the manufacture of a magnetic head be performed in two phases, first, while applying a bias of a first sense to a substrate to be sputtered upon, and, second, while applying a bias of different sense (e.g. a zero bias) to the substrate, thereby to cause a composite thin film to be formed on the substrate. The composition of the thin film in question is: 1. a (generally thin) region of material of high saturation magnetization layered with 2. a (generally thicker) region of lesser saturation magnetization.

Many user information needs are strongly influenced by time. Some of these intents are expressed ... more Many user information needs are strongly influenced by time. Some of these intents are expressed by users in queries issued indistinctively over time. Others follow a seasonal pattern. Examples of the latter are the queries "Golden Globe Award", "September 11th" or "Halloween", which refer to seasonal events that occur or have occurred at a specific occasion and for which, people often search in a planned and cyclic manner. Understanding this seasonal behavior, may help search engines to provide better ranking approaches and to respond with temporally relevant results leading into user's satisfaction. Detecting the diverse types of seasonal queries is therefore a key step for any search engine looking to present accurate results. In this paper, we categorize web search queries by their seasonality into 4 different categories: Non-Seasonal (NS, e.g., "Secure passwords"), Seasonal-related to ongoing events (SOE, "Golden Globe Award"), Seasonal-related to historical events (SHE, e.g., "September 11th") and Seasonal-related to special days and traditions (SSD, e.g., "Halloween"). To classify a given query we extract both time series (using the document publish date) and content features from its relevant documents. A Random Forest classifier is then used to classify web queries by their seasonality. Our experimental results show that they can be categorized with high accuracy.

Information Processing and Management, Nov 1, 2020
We address the problem of finding similar historical questions that are semantically equivalent o... more We address the problem of finding similar historical questions that are semantically equivalent or relevant to an input query question in community question-answering (CQA) sites. One of the main challenges for this task is that questions are usually too long and often contain peripheral information in addition to the main goals of the question. To address this problem, we propose an end-to-end Hierarchical Compare Aggregate (HCA) model that can handle this problem without using any task-specific features. We first split questions into sentences and compare every sentence pair of the two questions using a proposed Word-Level-Compare-Aggregate model called WLCA-model and then the comparison results are aggregated with a proposed Sentence-Level-Compare-Aggregate model to make the final decision. To handle the insufficient training data problem, we propose a sequential transfer learning approach to pre-train the WLCA-model on a large paraphrase detection dataset. Our experiments on two editions of the Semeval benchmark datasets and the domain-specific AskUbuntu dataset show that our model outperforms the stateof-the-art models.

arXiv (Cornell University), Jul 27, 2018
Recently, link prediction has attracted more attentions from various disciplines such as computer... more Recently, link prediction has attracted more attentions from various disciplines such as computer science, bioinformatics and economics. In this problem, unknown links between nodes are discovered based on numerous information such as network topology, profile information and user generated contents. Most of the previous researchers have focused on the structural features of the networks. While the recent researches indicate that contextual information can change the network topology. Although, there are number of valuable researches which combine structural and content information, but they face with the scalability issue due to feature engineering. Because, majority of the extracted features are obtained by a supervised or semi supervised algorithm. Moreover, the existing features are not general enough to indicate good performance on different networks with heterogeneous structures. Besides, most of the previous researches are presented for undirected and unweighted networks. In this paper, a novel link prediction framework called "DeepLink" is presented based on deep learning techniques. In contrast to the previous researches which fail to automatically extract best features for the link prediction, deep learning reduces the manual feature engineering. In this framework, both the structural and content information of the nodes are employed. The framework can use different structural feature vectors, which are prepared by various link prediction methods. It considers all proximity orders that are presented in a network during the structural feature learning. We have evaluated the performance of DeepLink on two real social network datasets including Telegram and irBlogs. On both datasets, the proposed framework outperforms several structural and hybrid approaches for link prediction problem.
World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering, Jan 24, 2008

Intelligent Data Analysis, Feb 23, 2009
ABSTRACT Selecting optimal locations for new facilities is a critical decision in organizations t... more ABSTRACT Selecting optimal locations for new facilities is a critical decision in organizations that provide field-based services such as delivery, maintenance and emergency services. The total logistics cost and facility establishment cost are the main objectives of the location selection procedure. With the increasing size of this problem in today's applications, the aspects of efficiency and scalability have developed into major challenges. In this paper, we study the use of spatial clustering methods to solve this problem and propose two new algorithms. The new algorithms determine the optimal locations of the new facilities plus their optimal total count during the search process. We have conducted many experiments for empirical comparative study on the application of several spatial clustering algorithms for optimal facility establishment. The benchmarks are conducted with both real world and synthetic data sets. The results reveal advantages of the proposed algorithms and confirm that these algorithms have better performance in terms of efficiency and objectives in the field-based services. Hence, the higher scalability and effectiveness of the proposed algorithms make them suitable solutions for the problem of optimal facility establishment with large databases.

arXiv (Cornell University), Aug 27, 2019
The entities of real-world networks are connected via different types of connections (i.e. layers... more The entities of real-world networks are connected via different types of connections (i.e. layers). The task of link prediction in multiplex networks is about finding missing connections based on both intralayer and inter-layer correlations. Our observations confirm that that in a wide range of real-world multiplex networks, from social to biological and technological, a positive correlation exists between connection probability in one layer and similarity in other layers. Accordingly, a similarity-based automatic general-purpose multiplex link prediction method-SimBins-is devised that quantifies the amount of connection uncertainty based on observed inter-layer correlations in a multiplex network. Moreover, SimBins enhances the prediction quality in the target layer by incorporating the effect of link overlap across layers. Applied to various datasets from different domains, SimBins proves to be robust and superior than compared methods in majority of experimented cases in terms of accuracy of link prediction. Furthermore, it is discussed that SimBins imposes minor computational overhead to the base similarity measures making it a potentially fast method, suitable for large-scale multiplex networks.
Compared with traditional association rule mining in the structured world (e.g. Relational Databa... more Compared with traditional association rule mining in the structured world (e.g. Relational Databases), mining from XML data is confronted with more challenges due to the inherent flexibilities of XML in both structure and semantics. The major challenges include 1) a more complicated hierarchical data structure; 2) an ordered data context; and 3) a much bigger size for each data element. In order to make XML-enabled association rule mining truly practical and computationally tractable, we propose a practical model for mining association rules from XML documents and demonstrate the usability and effectiveness of model through a set of experiments on real-life data.

Sar and Qsar in Environmental Research, Jul 25, 2016
Abstract Prediction of drug–disease associations is one of the current fields in drug repositioni... more Abstract Prediction of drug–disease associations is one of the current fields in drug repositioning that has turned into a challenging topic in pharmaceutical science. Several available computational methods use network-based and machine learning approaches to reposition old drugs for new indications. However, they often ignore features of drugs and diseases as well as the priority and importance of each feature, relation, or interactions between features and the degree of uncertainty. When predicting unknown drug–disease interactions there are diverse data sources and multiple features available that can provide more accurate and reliable results. This information can be collectively mined using data fusion methods and aggregation operators. Therefore, we can use the feature fusion method to make high-level features. We have proposed a computational method named scored mean kernel fusion (SMKF), which uses a new method to score the average aggregation operator called scored mean. To predict novel drug indications, this method systematically combines multiple features related to drugs or diseases at two levels: the drug–drug level and the drug–disease level. The purpose of this study was to investigate the effect of drug and disease features as well as data fusion to predict drug–disease interactions. The method was validated against a well-established drug–disease gold-standard dataset. When compared with the available methods, our proposed method outperformed them and competed well in performance with area under cover (AUC) of 0.91, F-measure of 84.9% and Matthews correlation coefficient of 70.31%.
Tree labeling plays a key role in XML query processing. In this paper, we propose a new labeling ... more Tree labeling plays a key role in XML query processing. In this paper, we propose a new labeling scheme, called Clusteringbased Labeling. Unlike all previous labeling methods, In this labeling scheme elements are separated into various groups, and a label is assigned to a group of elements instead of one element. Based on Clustering-based Labeling we design a new relational schema, similar to OrdPath scheme, for storing XML documents in relational database. Grouping Sibling nodes into one record reduces number of relational records needed for XML document storage. Our experimental results shows that our storing scheme significantly is better than tree well-known relational XML storing methods in terms of number of stored records, document reconstruction time and query processing performance.

The International Arab Journal of Information Technology, 2011
Although varieties of investigations have been done on human semantic interactions with Web resou... more Although varieties of investigations have been done on human semantic interactions with Web resources, no advanced and considerable progresses have been achieved. It could be said that comparative shopping systems are the last generations of B2C eCommerce systems that connect to multiple online stores and collect the information requested by the user. In some cases, the information is extracted from the online store sites through keyword search and other means of textual analysis. These processes make use of assumptions about the proximity of certain pieces of information. These heuristic approaches are error-prone and are not always guaranteed to work. In this paper, we propose an ontology-based approach to extract the products' information and the vendors' price from their public Web sites' pages. Although most vendors on the Web present their products' information in HTML documents that are not semantic formats. However, our approach is based on understanding semantics of HTML documents and extracting the information automatically.

arXiv (Cornell University), Jun 22, 2019
Networks are invaluable tools to study real biological, social and technological complex systems ... more Networks are invaluable tools to study real biological, social and technological complex systems in which connected elements form a purposeful phenomenon. A higher resolution image of these systems shows that the connection types do not confine to one but to a variety of types. Multiplex networks encode this complexity with a set of nodes which are connected in different layers via different types of links. A large body of research on link prediction problem is devoted to finding missing links in single-layer (simplex) networks. The proposed link prediction methods compute a similarity measure between unconnected node pairs based on the observed structure of the network. However, extension of notion of similarity to multiplex networks is a twofold challenge. The layers of real-world multiplex networks do not have the same organization yet are not of totally different organizations. So, it should be determined that how similar are the layers of a multiplex network. On the other hand, it is needed to be known that how similar layers can contribute in link prediction task on a target layer with missing links. Eigenvectors are known to well reflect the structural features of networks. Therefore, two layers of a multiplex network are similar w.r.t. structural features if they share similar eigenvectors. Experiments show that layers of real-world multiplex networks are similar w.r.t. structural features and the value of similarity is far beyond their randomized counterparts. Furthermore, it is shown that missing links are highly predictable if their addition or removal do not significantly change the network's structural features. Otherwise, if the change is significant a similar copy of structural features may come to help. Based on this concept, Layer Reconstruction Method (LRM) finds the best reconstruction of the observed structure of the target layer with structural features of other similar layers. Experiments on real multiplex networks from different disciplines show that this method benefits from information redundancy in the networks and helps the performance of link prediction to stay robust even under high fraction of missing links.
World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering, Sep 26, 2008

Scientific Reports, Jun 24, 2021
The entities of real-world networks are connected via different types of connections (i.e., layer... more The entities of real-world networks are connected via different types of connections (i.e., layers). The task of link prediction in multiplex networks is about finding missing connections based on both intra-layer and inter-layer correlations. Our observations confirm that in a wide range of real-world multiplex networks, from social to biological and technological, a positive correlation exists between connection probability in one layer and similarity in other layers. Accordingly, a similarity-based automatic general-purpose multiplex link prediction method-SimBins-is devised that quantifies the amount of connection uncertainty based on observed inter-layer correlations in a multiplex network. Moreover, SimBins enhances the prediction quality in the target layer by incorporating the effect of link overlap across layers. Applying SimBins to various datasets from diverse domains, our findings indicate that SimBins outperforms the compared methods (both baseline and state-of-the-art methods) in most instances when predicting links. Furthermore, it is discussed that SimBins imposes minor computational overhead to the base similarity measures making it a potentially fast method, suitable for large-scale multiplex networks. Link prediction has been an area of interest in the research of complex networks for over two decades 1 , studying the relationships between entities (nodes) in data represented as graphs. The main goal is to reveal the underlying truth behind emerging or missing connections between node pairs of a network. Link prediction methods have a wide range of applications, from discovery of latent and spurious interactions in biological networks (which is basically quite costly if performed in traditional methods) 2,3 to recommender systems 4,5 and better routing in wireless mobile networks 6. Numerous perspectives have been adopted to attack the problem of link prediction. According to similarity-based methods, similarity between nodes determines their likelihood of linkage. This approach is a result of assuming that two nodes are similar if they share many common features 7. A whole lot of nodes' features stay hidden (or are kept hidden intentionally) in real networks. Further, an interesting question is, despite the fact that a considerable amount of information is hidden in a network, what fraction of the truth can still be extracted by merely including structural features? That is one of the main drives to utilize structural similarity indices for link prediction. Several different classifications of similarity measures have been proposed, among all, classifying based on locality of indices is of great importance. To name a few, Common Neighbors (CN) 1 , Preferential Attachment (PA) 8 , Adamic-Adar (AA) 9 and Resource Allocation (RA) 10 are popular indices focusing mostly on nodes' structural features, each with unique characteristics. Even though these indexes are simple, they are popular because of their low computational cost and reasonable prediction performance. On the other hand, global indices take features of the whole network structure into account, tolerating higher cost of computation, usually in favor of more accurate information. Take length of paths between pairs of nodes for instance, which the well-known Katz 11 index operates on. Average Commute Time (ACT) 1 and PageRank 12 are some other notable global indices. In between lie the quasi-local methods which are able to combine properties from both local and global indices, meaning they include global information, but their computational complexity is similar to that of local methods, such as the Local Path (LP) 13 index and Local Random Walk (LRW) 14. For more detailed information on these similarity indices (also described as unsupervised methods in the literature 15), readers are advised to refer to 16. Some researchers have tackled the link prediction problem using the ideas of information theory. These works are based on the fact that similarity of node pairs can be written in term of the uncertainty of their connectivity. At the beginning, the uncertainty of connectivity can be estimated based on priors. Later, all structures around the unconnected node pairs can be considered as evidences to reduce the level of uncertainty in connectedness
Protein data patterns which are discriminative can be used in many beneficial applications if the... more Protein data patterns which are discriminative can be used in many beneficial applications if they are defined correctly such as molecular medicine, agriculture, and microbial genome applications. Prediction of protein folding patterns by which the function of a protein ...

World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering, Jul 23, 2008
One important problem in today organizations is the existence of non-integrated information syste... more One important problem in today organizations is the existence of non-integrated information systems, inconsistency and lack of suitable correlations between legacy and modern systems. One main solution is to transfer the local databases into a global one. In this regards we need to extract the data structures from the legacy systems and integrate them with the new technology systems. In legacy systems, huge amounts of a data are stored in legacy databases. They require particular attention since they need more efforts to be normalized, reformatted and moved to the modern database environments. Designing the new integrated (global) database architecture and applying the reverse engineering requires data normalization. This paper proposes the use of database reverse engineering in order to integrate legacy and modern databases in organizations. The suggested approach consists of methods and techniques for generating data transformation rules needed for the data structure normalization.
Expert Systems with Applications

Knowledge-Based Systems, 2018
Social network analysis provides meaningful information about behavior of network members that ca... more Social network analysis provides meaningful information about behavior of network members that can be used for diverse applications such as classification, link prediction. However, network analysis is computationally expensive because of feature learning for different applications. In recent years, many researches have focused on feature learning methods in social networks. Network embedding represents the network in a lower dimensional representation space with the same properties which presents a compressed representation of the network. In this paper, we introduce a novel algorithm named "CARE" for network embedding that can be used for different types of networks including weighted, directed and complex. Current methods try to preserve local neighborhood information of nodes, whereas the proposed method utilizes local neighborhood and community information of network nodes to cover both local and global structure of social networks. CARE builds customized paths, which are consisted of local and global structure of network nodes, as a basis for network embedding and uses the Skip-gram model to learn representation vector of nodes. Subsequently, stochastic gradient descent is applied to optimize our objective function and learn the final representation of nodes. Our method can be scalable when new nodes are appended to network without information loss. Parallelize generation of customized random walks is also used for speeding up CARE. We evaluate the performance of CARE on multi label classification and link prediction tasks. Experimental results on various networks indicate that the proposed method outperforms others in both Micro and Macro-f1 measures for different size of training data.
Uploads
Papers by Maseud Rahgozar