Journal Articles by Michael Fire

Nature Humanities and Social Sciences Communications, 2020
Data science can offer answers to a wide range of social science questions. Here we turn attentio... more Data science can offer answers to a wide range of social science questions. Here we turn attention to the portrayal of women in movies, an industry that has a significant influence on society, impacting such aspects of life as self-esteem and career choice. To this end, we fused data from the online movie database IMDb with a dataset of movie dialogue subtitles to create the largest available corpus of movie social networks (15,540 networks). Analyzing this data, we investigated gender bias in on-screen female characters over the past century. We find a trend of improvement in all aspects of women‘s roles in movies, including a constant rise in the centrality of female characters. There has also been an increase in the number of movies that pass the well-known Bechdel test, a popular—albeit flawed—measure of women in fiction. Here we propose a new and better alternative to this test for evaluating female roles in movies. Our study introduces fresh data, an open-code framework, and novel techniques that present new opportunities in the research and analysis of movies

GigaScience, 2020
Background
COVID-19 is the most rapidly expanding coronavirus outbreak in the past 2 decades. To ... more Background
COVID-19 is the most rapidly expanding coronavirus outbreak in the past 2 decades. To provide a swift response to a novel outbreak, prior knowledge from similar outbreaks is essential.
Results
Here, we study the volume of research conducted on previous coronavirus outbreaks, specifically SARS and MERS, relative to other infectious diseases by analyzing >35 million articles from the past 20 years. Our results demonstrate that previous coronavirus outbreaks have been understudied compared with other viruses. We also show that the research volume of emerging infectious diseases is very high after an outbreak and decreases drastically upon the containment of the disease. This can yield inadequate research and limited investment in gaining a full understanding of novel coronavirus management and prevention.
Conclusions
Independent of the outcome of the current COVID-19 outbreak, we believe that measures should be taken to encourage sustained research in the field.

GigaScience, 2019
Abstract
Background
The academic publishing world is changing significantly, with ever-growing nu... more Abstract
Background
The academic publishing world is changing significantly, with ever-growing numbers of publications each year and shifting publishing patterns. However, the metrics used to measure academic success, such as the number of publications, citation number, and impact factor, have not changed for decades. Moreover, recent studies indicate that these metrics have become targets and follow Goodhart’s Law, according to which, “when a measure becomes a target, it ceases to be a good measure.”
Results
In this study, we analyzed >120 million papers to examine how the academic publishing world has evolved over the last century, with a deeper look into the specific field of biology. Our study shows that the validity of citation-based measures is being compromised and their usefulness is lessening. In particular, the number of publications has ceased to be a good metric as a result of longer author lists, shorter papers, and surging publication numbers. Citation-based metrics, such citation number and h-index, are likewise affected by the flood of papers, self-citations, and lengthy reference lists. Measures such as a journal’s impact factor have also ceased to be good metrics due to the soaring numbers of papers that are published in top journals, particularly from the same pool of authors. Moreover, by analyzing properties of >2,600 research fields, we observed that citation-based metrics are not beneficial for comparing researchers in different fields, or even in the same department.
Conclusions
Academic publishing has changed considerably; now we need to reconsider how we measure success.

Elsevier Information Processing & Management, 2020
Trends change rapidly in today's world and are readily observed in lists of most important people... more Trends change rapidly in today's world and are readily observed in lists of most important people, rankings of global companies, infectious disease patterns, political opinions, and popularities of online social networks. A key question arises: What is the mechanism behind the emergence of new trends? To answer this question, we can model real-world dynamic systems as networks, where a network is represented by a set of vertices and their corresponding links. The features and topology of these networks can then be analyzed, including how they evolve over a long period of time. However, the actual mechanisms behind these dynamic systems remain difficult to understand. Here we show the construction of the largest publicly available network evolution dataset to date, which we utilized to reveal how key entities in a network gain power. We employed state-of-the art data science tools and extensive cloud computing resources to create this massive corpora that contains 38,000 real-world networks and 2.5 million graphs. Then, we performed the first precise wide-scale analysis of the evolution of networks with various scales. Three primary observations emerged: first, links are most prevalent among vertices that join a network at a similar time; second, the rate that new vertices join a network is a central factor in molding a network's topology; and third, the emergence of network stars (high-degree vertices) is correlated with fast-growing networks. We applied our learnings to develop a simple network-generation model-a flexible model based on large-scale, real-world data. Our results are applicable to dynamic systems in nature and society, and deliver a better understanding of how stars within these networks rise and fall.

Springer Journal of Social Network Analysis and Mining (SNAM), 2018
In the past decade, graph-based structures have penetrated nearly every aspect of our lives. The ... more In the past decade, graph-based structures have penetrated nearly every aspect of our lives. The detection of anomalies in these networks has become increasingly important, such as in exposing infected endpoints in computer networks or identifying socialbots. In this study, we present a novel unsupervised two-layered meta-classifier that can detect irregular vertices in complex networks solely by utilizing topology-based features. Following the reasoning that a vertex with many improbable links has a higher likelihood of being anomalous, we applied our method on 10 networks of various scales, from a network of several dozen students to online networks with millions of vertices. In every scenario, we succeeded in identifying anomalous vertices with lower false positive rates and higher AUCs compared to other prevalent methods. Moreover, we demonstrated that the presented algorithm is generic, and efficient both in revealing fake users and in disclosing the influential people in social networks.

Online Social Networks (OSNs), such as Facebook and Twitter, have become an integral part of our ... more Online Social Networks (OSNs), such as Facebook and Twitter, have become an integral part of our daily lives. There are hundreds of OSNs, each with its own focus in that each oers particular services and functionalities. Recent studies show that many OSN users create several accounts on multiple OSNs using the same or dierent personal information. Collecting all the available data of an individual from several OSNs and fusing it into a single prole can be useful for many purposes. In this paper, we introduce novel machine learning based methods for solving Entity Resolution (ER), a problem for matching user proles across multiple OSNs. The presented methods are able to match between two user proles from two dierent OSNs based on supervised learning techniques, which use features extracted from each one of the user proles. By using the extracted features and supervised learning techniques, we developed classiers which can perform entity matching between two proles for the following scenarios: (a) matching entities across two OSNs; (b) searching for a user by similar name; and (c) de-anonymizing a user's identity. The constructed classiers were tested by using data collected from two popular OSNs, Facebook and Xing. We then evaluated the clas-siers' performances using various evaluation measures, such as true and false positive rates, accuracy, and the Area Under the receiver operator Curve (AUC). The constructed classiers were evaluated and their classication performance measured by AUC was quite remarkable , with an AUC of up to 0.982 and an accuracy of up to 95.9% in identifying user proles across two OSNs.

IEEE Communications Surveys & Tutorials
Many online social network (OSN) users are unaware of the numerous security risks that exist in t... more Many online social network (OSN) users are unaware of the numerous security risks that exist in these networks, including privacy violations, identity theft, and sexual harassment, just to name a few. According to recent studies, OSN users readily expose personal and private details about themselves, such as relationship status, date of birth, school name, email address, phone number, and even home address. This information, if put into the wrong hands, can be used to harm users both in the virtual world and in the real world. These risks become even more severe when the users are children.
In this paper we present a thorough review of the different security and privacy risks which threaten the well-being of OSN users in general, and children in particular. In addition, we present an overview of existing solutions that can provide better protection, security, and privacy for OSN users.
We also offer simple-to-implement recommendations for OSN users which can improve their security and privacy when using these platforms. Furthermore, we suggest future research directions.

Online genealogy datasets contain extensive information about millions of people and their past a... more Online genealogy datasets contain extensive information about millions of people and their past and present family connections. This vast amount of data can assist in identifying various patterns in human population.
In this study, we present methods and algorithms which can assist in identifying variations in lifespan distributions of human population in the past centuries, in detecting social and genetic features which correlate with human lifespan, and in constructing predictive models of human lifespan based on various features which can easily be extracted from genealogy datasets.
We have evaluated the presented methods and algorithms on a large online genealogy dataset with over a million profiles and over 8.8 million connections, all of which were collected from the WikiTree website.
Our findings indicate that significant but small positive correlations exist between the parents' lifespan and their children's lifespan. Additionally, we found slightly higher and significant correlations between the lifespans of spouses. We also discovered a very small positive and significant correlation between longevity and reproductive success in males, and a small and significant negative correlation between longevity and reproductive success in females. Moreover, our machine learning algorithms presented better than random classification results in predicting which people who outlive the age of 50 will also outlive the age of 80.
We believe that this study will be the first of many studies which utilize the wealth of data on human populations, existing in online genealogy datasets, to better understand factors which influence human lifespan. Understanding these factors can assist scientists in providing solutions for successful aging.

Springer Networks and Spatial Economics (NETS)
Complementing the formal organizational structure of a business are the informal connections amon... more Complementing the formal organizational structure of a business are the informal connections among employees. These relationships help identify knowledge hubs, working groups, and shortcuts through the organizational structure. They carry valuable information on how a company functions de facto. In the past, eliciting the informal social networks within an organization was challenging; today they are reflected by friendship relationships in online social networks. In this paper we analyze several commercial organizations by mining data which their employees have exposed on Facebook, LinkedIn, and other publicly available sources. Using a web crawler designed for this purpose, we extract a network of informal social relationships among employees of targeted organizations. Our results show that it is possible to identify leadership roles within the organization solely by using centrality analysis and machine learning techniques applied to the informal relationship network structure. Valuable non-trivial insights can also be gained by clustering an organization’s social network and gathering publicly available information on the employees within each cluster. Knowledge of the network of informal relationships may be a major asset or might be a significant threat to the underlying organization.
ASE Human Journal, 2012
"Today’s social networks are plagued by numerous types
of malicious profiles, ranging from bots ... more "Today’s social networks are plagued by numerous types
of malicious profiles, ranging from bots to sexual predators. We present a novel method for the detection of these
malicious profiles by only using the social network’s own
topological features. The reliance on only these features
ensures that the proposed method is generic enough to
be applied on many types of social networks. The algorithm has been evaluated on several social networks and
was found to be effective in detecting several types of
malicious profiles. We believe this method is an important step towards making social networks less vulnerable to spammers, socialbots and sexual predators."

""Online social networking sites have become increasingly popular over the last few years. As a r... more ""Online social networking sites have become increasingly popular over the last few years. As a result, new interdisciplinary research directions have emerged in which social network analysis methods are applied to networks containing hundreds of millions of users. Unfortunately, links between individuals may be missing either due to an imperfect acquirement processes or because they are not yet reflected in the online network
(i.e., friends in real-world did not form a virtual connection.) The primary bottleneck in link prediction techniques is extracting the structural features required for classifying links. In this paper, we propose a set of simple, easy-to-compute structural features, that can be analyzed to identify missing links. We show that by using simple structural features, a machine learning classifier can successfully identify missing links, even when applied to a hard predicament of classifying links between individuals with at least one common friend. We also present a method for calculating the amount of data needed in order to build more accurate classifiers. The new Friends-measure and Same-community features we developed are shown to be good predictors for missing links. An evaluation experiment was performed on ten large Social Networks datasets: Academia.edu, DBLP, Facebook, Flickr, Flixster, Google+, Gowalla, TheMarker, Twitter, and YouTube. Our methods can provide social network site operators with the capability of helping users to find known, offline contacts and to discover new friends online. They may also be used for exposing hidden links in an online social networks.""

"The amount of personal information involuntarily exposed by users on online social networks is s... more "The amount of personal information involuntarily exposed by users on online social networks is staggering, as shown in recent research. Moreover, recent reports indicate that these networks are inundated with tens of millions of fake user profiles, which may jeopardize the user’s security and privacy. To identify fake users in such networks and to improve users’ security and privacy, we developed the Social Privacy Protector (SPP) software for Facebook. This software contains three protection layers that improve user privacy by implementing different methods to identify fake profiles. The first layer identifies a user’s friends who might pose a threat and then restricts the access these “friends” have to the user’s personal information. The second layer is an expansion of Facebook’s basic privacy settings based on different types of social network usage profiles. The third layer alerts users about the number of installed applications on their Facebook profile that has access to their private information. An initial version of the SPP software received positive media coverage, and more than 3,000 users from more than 20 countries have installed the software, out of which 527 have used the software to restrict more than 9,000 friends. In addition, we estimate that more than 100 users have accepted the software’s recommendations and removed nearly 1,800 Facebook applications from their profiles. By analyzing the unique dataset obtained by the software in combination with machine learning techniques, we developed classifiers that are able to predict Facebook profiles with a high probability of being fake and consequently threaten the user’s security and privacy. Moreover, in this study, we present statistics generated by the SPP software on both user privacy settings and the number of applications installed on Facebook profiles. These statistics alarmingly demonstrate how vulnerable Facebook users’ information is to both fake profile attacks and third-party Facebook applications."
This is a draft version of the article. The full version of the article was published in Social Network Analysis and Mining Journal, and can be downloaded from the following link:
http://link.springer.com/article/10.1007%2Fs13278-014-0194-4
This paper develops a methodology to aggregate signals in a network regarding some hidden state o... more This paper develops a methodology to aggregate signals in a network regarding some hidden state of the world. We argue that focusing on edges around hubs will under certain circumstances amplify the faint signals disseminating in a network, allowing for more efficient detection of that hidden state. We apply this method to detecting emergencies in mobile phone data, demonstrating that under a broad range of cases and a constraint in how many edges can be observed at a time, focusing on the egocentric networks around key hubs will be more effective than sampling random edges. We support this conclusion analytically, through simulations, and with analysis of a dataset containing the call log data from a major mobile carrier in a European nation.

Springer Science and Engineering Ethics Journal (accepted)
Online Social Networks (OSNs) have rapidly become a prominent and widely used service, offering a... more Online Social Networks (OSNs) have rapidly become a prominent and widely used service, offering a wealth of personal and sensitive information with significant security and privacy implications. Hence, OSNs are also an important - and popular - subject for research. To perform research based on real-life evidence, however, researchers may need to access OSN data, such as texts and files uploaded by users and connections among users. This raises significant ethical problems. Currently, there are no clear ethical guidelines, and researchers may end up (unintentionally) performing ethically questionable research, sometimes even when more ethical research alternatives exist. For example, several studies have employed `fake identities` to collect data from OSNs, but fake identities may be used for attacks and are considered a security issue. Is it legitimate to use fake identities for studying OSNs or for collecting OSN data for research? We present a taxonomy of the ethical challenges facing researchers of OSNs and compare different approaches. We demonstrate how ethical considerations have been taken into account in previous studies that used fake identities. In addition, several possible approaches are offered to reduce or avoid ethical misconducts. We hope this work will stimulate the development and use of ethical practices and methods in the research of online social networks.

A dimension of the Internet that has gained great popularity in recent years is the platform of o... more A dimension of the Internet that has gained great popularity in recent years is the platform of online social networks (OSNs). Users all over the world write, share, and publish personal information about themselves, their friends, and their workplaces within this platform of communication. In this study we demonstrate the relative ease of creating malicious socialbots that act as social network ”friends,” resulting in OSN users unknowingly exposing potentially harmful information about themselves and their places of employment. We present an algorithm for infiltrating specific OSN users who are employees of targeted organizations, using the topologies of organizational social networks and utilizing socialbots to gain access to these networks. We focus on two well-known OSNs - Facebook and Xing - to evaluate our suggested method for infiltrating key-role employees in targeted organizations. The results obtained demonstrate how adversaries can infiltrate social networks to gain access to valuable, private information
regarding employees and their organizations.
Journal Articles under Review by Michael Fire

We examine the impact that homophily can have on the diffusion of a phenomenon. We identify three... more We examine the impact that homophily can have on the diffusion of a phenomenon. We identify three mechanisms from the literature by which homophily can have an effect and model how they can change diffusion that is happening through social influence. By modelling and simulation we vary the size and composition of the initial seed of adopters who start the diffusion process -- the 'critical mass' -- and test this on simulated and real data. We then use real data on personal characteristics to model genuine -- rather than simulated -- homophily. Our main contribution lies in examining the impact that the composition of the critical mass has. When the critical mass group is small, a homophilious group will cause a phenomenon to spread further than a heterophilious group. As the critical mass group grows in size a chaotic period is entered where small variations in the composition will have a huge effect on whether a group of all one type or another will cause more diffusion. As the group size continues to grow a new pattern emerges where a heterophilious group will cause more diffusion than a homophilious group. These results are discussed and avenues for future research are identified.
Refereed Conference Proceedings by Michael Fire
2020 IEEE International Smart Cities Conference (ISC2), 2020
In city planning and maintenance, the ability to quickly identify infrastructure violations - suc... more In city planning and maintenance, the ability to quickly identify infrastructure violations - such as missing or misplaced fire hydrants - can be crucial for maintaining a safe city; it can even save lives. In this work, we aim to provide an analysis of such violations, and to demonstrate the potential of data-driven approaches for quickly locating and addressing them. We conduct an analytical study based upon data from the city of Beer-Sheva’s public records of fire hydrants, bomb shelters, and other public facilities. The results of our analysis are presented along with an interactive exploration tool, which allows for easy exploration and identification of the different facilities around the city that violate regulations.

International Conference on Intelligent Data Engineering and Automated Learning (IDEAL), 2020
With the increase in population densities and environmental awareness, public transport has becom... more With the increase in population densities and environmental awareness, public transport has become an important aspect of urban life. Consequently, large quantities of transportation data are generated, and mining data from smart card use has become a standardized method to understand the travel habits of passengers.
Increase in available data and computation power demands more sophisticated methods to analyze big data. Public transport datasets, however, often lack data integrity. Boarding stop information may be missing either due to imperfect acquirement processes or inadequate reporting. As a result, large quantities of observations and even complete sections of cities might be absent from the smart card database. We have developed a machine (supervised) learning method to impute missing boarding stops based on ordinal classification. In addition, we present a new metric, Pareto Accuracy, to evaluate algorithms where classes have an ordinal nature. Results are based on a case study in the city of Beer Sheva utilizing one month of data. We show that our proposed method significantly outperforms schedule-based imputation methods and can improve the accuracy and usefulness of large-scale transportation data. The implications for data imputation of smart card information is further discussed.

The 3rd IEEE International Conference on Smart Data (SmartData), 2017
Online advertising is a huge, rapidly growing advertising market in today's world. One common for... more Online advertising is a huge, rapidly growing advertising market in today's world. One common form of online advertising is using image ads. A decision is made (often in real time) every time a user sees an ad, and the advertiser is eager to determine the best ad to display. Consequently, many algorithms have been developed that calculate the optimal ad to show to the current user at the present time. Typically, these algorithms focus on variations of the ad, optimizing among different properties such as background color, image size, or set of images.
However, there is a more fundamental layer. Our study looks at new qualities of ads that can be determined before an ad is shown (rather than online optimization) and defines which ads are most likely to be successful.
We present a set of novel algorithms that utilize deep-learning image processing, machine learning, and graph theory to investigate online advertising and to construct prediction models which can foresee an image ad's success.
We evaluated our algorithms on a dataset with over 260,000 ad images, as well as a smaller dataset specifically related to the automotive industry, and we succeeded in constructing regression models for ad image click rate prediction.
The obtained results emphasize the great potential of using deep-learning algorithms to effectively and efficiently analyze image ads and to create better and more innovative online ads. Moreover, the algorithms presented in this paper can help predict ad success and can be applied to analyze other large-scale image corpora.

SocialCom 2013
In recent years, Online Social Networks (OSNs)
have essentially become an integral part of ... more In recent years, Online Social Networks (OSNs)
have essentially become an integral part of our daily lives. There
are hundreds of OSNs, each with its own focus and offers for
particular services and functionalities. To take advantage of the
full range of services and functionalities that OSNs offer, users
often create several accounts on various OSNs using the same or
different personal information. Retrieving all available data
about an individual from several OSNs and merging it into one
profile can be useful for many purposes. In this paper, we present
a method for solving the Entity Resolution (ER), problem for
matching user profiles across multiple OSNs. Our algorithm is
able to match two user profiles from two different OSNs based on
machine learning techniques, which uses features extracted from
each one of the user profiles. Using supervised learning
techniques and extracted features, we constructed different
classifiers, which were then trained and used to rank the
probability that two user profiles from two different OSNs
belong to the same individual. These classifiers utilized 27
features of mainly three types: name based features (i.e., the
Soundex value of two names), general user info based features
(i.e., the cosine similarity between two user profiles), and social
network topological based features (i.e., the number of mutual
friends between two users’ friends list). This experimental study
uses real-life data collected from two popular OSNs, Facebook
and Xing. The proposed algorithm was evaluated and its
classification performance measured by AUC was 0.982 in
identifying user profiles across two OSNs.
Uploads
Journal Articles by Michael Fire
COVID-19 is the most rapidly expanding coronavirus outbreak in the past 2 decades. To provide a swift response to a novel outbreak, prior knowledge from similar outbreaks is essential.
Results
Here, we study the volume of research conducted on previous coronavirus outbreaks, specifically SARS and MERS, relative to other infectious diseases by analyzing >35 million articles from the past 20 years. Our results demonstrate that previous coronavirus outbreaks have been understudied compared with other viruses. We also show that the research volume of emerging infectious diseases is very high after an outbreak and decreases drastically upon the containment of the disease. This can yield inadequate research and limited investment in gaining a full understanding of novel coronavirus management and prevention.
Conclusions
Independent of the outcome of the current COVID-19 outbreak, we believe that measures should be taken to encourage sustained research in the field.
Background
The academic publishing world is changing significantly, with ever-growing numbers of publications each year and shifting publishing patterns. However, the metrics used to measure academic success, such as the number of publications, citation number, and impact factor, have not changed for decades. Moreover, recent studies indicate that these metrics have become targets and follow Goodhart’s Law, according to which, “when a measure becomes a target, it ceases to be a good measure.”
Results
In this study, we analyzed >120 million papers to examine how the academic publishing world has evolved over the last century, with a deeper look into the specific field of biology. Our study shows that the validity of citation-based measures is being compromised and their usefulness is lessening. In particular, the number of publications has ceased to be a good metric as a result of longer author lists, shorter papers, and surging publication numbers. Citation-based metrics, such citation number and h-index, are likewise affected by the flood of papers, self-citations, and lengthy reference lists. Measures such as a journal’s impact factor have also ceased to be good metrics due to the soaring numbers of papers that are published in top journals, particularly from the same pool of authors. Moreover, by analyzing properties of >2,600 research fields, we observed that citation-based metrics are not beneficial for comparing researchers in different fields, or even in the same department.
Conclusions
Academic publishing has changed considerably; now we need to reconsider how we measure success.
In this paper we present a thorough review of the different security and privacy risks which threaten the well-being of OSN users in general, and children in particular. In addition, we present an overview of existing solutions that can provide better protection, security, and privacy for OSN users.
We also offer simple-to-implement recommendations for OSN users which can improve their security and privacy when using these platforms. Furthermore, we suggest future research directions.
In this study, we present methods and algorithms which can assist in identifying variations in lifespan distributions of human population in the past centuries, in detecting social and genetic features which correlate with human lifespan, and in constructing predictive models of human lifespan based on various features which can easily be extracted from genealogy datasets.
We have evaluated the presented methods and algorithms on a large online genealogy dataset with over a million profiles and over 8.8 million connections, all of which were collected from the WikiTree website.
Our findings indicate that significant but small positive correlations exist between the parents' lifespan and their children's lifespan. Additionally, we found slightly higher and significant correlations between the lifespans of spouses. We also discovered a very small positive and significant correlation between longevity and reproductive success in males, and a small and significant negative correlation between longevity and reproductive success in females. Moreover, our machine learning algorithms presented better than random classification results in predicting which people who outlive the age of 50 will also outlive the age of 80.
We believe that this study will be the first of many studies which utilize the wealth of data on human populations, existing in online genealogy datasets, to better understand factors which influence human lifespan. Understanding these factors can assist scientists in providing solutions for successful aging.
of malicious profiles, ranging from bots to sexual predators. We present a novel method for the detection of these
malicious profiles by only using the social network’s own
topological features. The reliance on only these features
ensures that the proposed method is generic enough to
be applied on many types of social networks. The algorithm has been evaluated on several social networks and
was found to be effective in detecting several types of
malicious profiles. We believe this method is an important step towards making social networks less vulnerable to spammers, socialbots and sexual predators."
(i.e., friends in real-world did not form a virtual connection.) The primary bottleneck in link prediction techniques is extracting the structural features required for classifying links. In this paper, we propose a set of simple, easy-to-compute structural features, that can be analyzed to identify missing links. We show that by using simple structural features, a machine learning classifier can successfully identify missing links, even when applied to a hard predicament of classifying links between individuals with at least one common friend. We also present a method for calculating the amount of data needed in order to build more accurate classifiers. The new Friends-measure and Same-community features we developed are shown to be good predictors for missing links. An evaluation experiment was performed on ten large Social Networks datasets: Academia.edu, DBLP, Facebook, Flickr, Flixster, Google+, Gowalla, TheMarker, Twitter, and YouTube. Our methods can provide social network site operators with the capability of helping users to find known, offline contacts and to discover new friends online. They may also be used for exposing hidden links in an online social networks.""
This is a draft version of the article. The full version of the article was published in Social Network Analysis and Mining Journal, and can be downloaded from the following link:
http://link.springer.com/article/10.1007%2Fs13278-014-0194-4
regarding employees and their organizations.
Journal Articles under Review by Michael Fire
Refereed Conference Proceedings by Michael Fire
Increase in available data and computation power demands more sophisticated methods to analyze big data. Public transport datasets, however, often lack data integrity. Boarding stop information may be missing either due to imperfect acquirement processes or inadequate reporting. As a result, large quantities of observations and even complete sections of cities might be absent from the smart card database. We have developed a machine (supervised) learning method to impute missing boarding stops based on ordinal classification. In addition, we present a new metric, Pareto Accuracy, to evaluate algorithms where classes have an ordinal nature. Results are based on a case study in the city of Beer Sheva utilizing one month of data. We show that our proposed method significantly outperforms schedule-based imputation methods and can improve the accuracy and usefulness of large-scale transportation data. The implications for data imputation of smart card information is further discussed.
However, there is a more fundamental layer. Our study looks at new qualities of ads that can be determined before an ad is shown (rather than online optimization) and defines which ads are most likely to be successful.
We present a set of novel algorithms that utilize deep-learning image processing, machine learning, and graph theory to investigate online advertising and to construct prediction models which can foresee an image ad's success.
We evaluated our algorithms on a dataset with over 260,000 ad images, as well as a smaller dataset specifically related to the automotive industry, and we succeeded in constructing regression models for ad image click rate prediction.
The obtained results emphasize the great potential of using deep-learning algorithms to effectively and efficiently analyze image ads and to create better and more innovative online ads. Moreover, the algorithms presented in this paper can help predict ad success and can be applied to analyze other large-scale image corpora.
have essentially become an integral part of our daily lives. There
are hundreds of OSNs, each with its own focus and offers for
particular services and functionalities. To take advantage of the
full range of services and functionalities that OSNs offer, users
often create several accounts on various OSNs using the same or
different personal information. Retrieving all available data
about an individual from several OSNs and merging it into one
profile can be useful for many purposes. In this paper, we present
a method for solving the Entity Resolution (ER), problem for
matching user profiles across multiple OSNs. Our algorithm is
able to match two user profiles from two different OSNs based on
machine learning techniques, which uses features extracted from
each one of the user profiles. Using supervised learning
techniques and extracted features, we constructed different
classifiers, which were then trained and used to rank the
probability that two user profiles from two different OSNs
belong to the same individual. These classifiers utilized 27
features of mainly three types: name based features (i.e., the
Soundex value of two names), general user info based features
(i.e., the cosine similarity between two user profiles), and social
network topological based features (i.e., the number of mutual
friends between two users’ friends list). This experimental study
uses real-life data collected from two popular OSNs, Facebook
and Xing. The proposed algorithm was evaluated and its
classification performance measured by AUC was 0.982 in
identifying user profiles across two OSNs.
COVID-19 is the most rapidly expanding coronavirus outbreak in the past 2 decades. To provide a swift response to a novel outbreak, prior knowledge from similar outbreaks is essential.
Results
Here, we study the volume of research conducted on previous coronavirus outbreaks, specifically SARS and MERS, relative to other infectious diseases by analyzing >35 million articles from the past 20 years. Our results demonstrate that previous coronavirus outbreaks have been understudied compared with other viruses. We also show that the research volume of emerging infectious diseases is very high after an outbreak and decreases drastically upon the containment of the disease. This can yield inadequate research and limited investment in gaining a full understanding of novel coronavirus management and prevention.
Conclusions
Independent of the outcome of the current COVID-19 outbreak, we believe that measures should be taken to encourage sustained research in the field.
Background
The academic publishing world is changing significantly, with ever-growing numbers of publications each year and shifting publishing patterns. However, the metrics used to measure academic success, such as the number of publications, citation number, and impact factor, have not changed for decades. Moreover, recent studies indicate that these metrics have become targets and follow Goodhart’s Law, according to which, “when a measure becomes a target, it ceases to be a good measure.”
Results
In this study, we analyzed >120 million papers to examine how the academic publishing world has evolved over the last century, with a deeper look into the specific field of biology. Our study shows that the validity of citation-based measures is being compromised and their usefulness is lessening. In particular, the number of publications has ceased to be a good metric as a result of longer author lists, shorter papers, and surging publication numbers. Citation-based metrics, such citation number and h-index, are likewise affected by the flood of papers, self-citations, and lengthy reference lists. Measures such as a journal’s impact factor have also ceased to be good metrics due to the soaring numbers of papers that are published in top journals, particularly from the same pool of authors. Moreover, by analyzing properties of >2,600 research fields, we observed that citation-based metrics are not beneficial for comparing researchers in different fields, or even in the same department.
Conclusions
Academic publishing has changed considerably; now we need to reconsider how we measure success.
In this paper we present a thorough review of the different security and privacy risks which threaten the well-being of OSN users in general, and children in particular. In addition, we present an overview of existing solutions that can provide better protection, security, and privacy for OSN users.
We also offer simple-to-implement recommendations for OSN users which can improve their security and privacy when using these platforms. Furthermore, we suggest future research directions.
In this study, we present methods and algorithms which can assist in identifying variations in lifespan distributions of human population in the past centuries, in detecting social and genetic features which correlate with human lifespan, and in constructing predictive models of human lifespan based on various features which can easily be extracted from genealogy datasets.
We have evaluated the presented methods and algorithms on a large online genealogy dataset with over a million profiles and over 8.8 million connections, all of which were collected from the WikiTree website.
Our findings indicate that significant but small positive correlations exist between the parents' lifespan and their children's lifespan. Additionally, we found slightly higher and significant correlations between the lifespans of spouses. We also discovered a very small positive and significant correlation between longevity and reproductive success in males, and a small and significant negative correlation between longevity and reproductive success in females. Moreover, our machine learning algorithms presented better than random classification results in predicting which people who outlive the age of 50 will also outlive the age of 80.
We believe that this study will be the first of many studies which utilize the wealth of data on human populations, existing in online genealogy datasets, to better understand factors which influence human lifespan. Understanding these factors can assist scientists in providing solutions for successful aging.
of malicious profiles, ranging from bots to sexual predators. We present a novel method for the detection of these
malicious profiles by only using the social network’s own
topological features. The reliance on only these features
ensures that the proposed method is generic enough to
be applied on many types of social networks. The algorithm has been evaluated on several social networks and
was found to be effective in detecting several types of
malicious profiles. We believe this method is an important step towards making social networks less vulnerable to spammers, socialbots and sexual predators."
(i.e., friends in real-world did not form a virtual connection.) The primary bottleneck in link prediction techniques is extracting the structural features required for classifying links. In this paper, we propose a set of simple, easy-to-compute structural features, that can be analyzed to identify missing links. We show that by using simple structural features, a machine learning classifier can successfully identify missing links, even when applied to a hard predicament of classifying links between individuals with at least one common friend. We also present a method for calculating the amount of data needed in order to build more accurate classifiers. The new Friends-measure and Same-community features we developed are shown to be good predictors for missing links. An evaluation experiment was performed on ten large Social Networks datasets: Academia.edu, DBLP, Facebook, Flickr, Flixster, Google+, Gowalla, TheMarker, Twitter, and YouTube. Our methods can provide social network site operators with the capability of helping users to find known, offline contacts and to discover new friends online. They may also be used for exposing hidden links in an online social networks.""
This is a draft version of the article. The full version of the article was published in Social Network Analysis and Mining Journal, and can be downloaded from the following link:
http://link.springer.com/article/10.1007%2Fs13278-014-0194-4
regarding employees and their organizations.
Increase in available data and computation power demands more sophisticated methods to analyze big data. Public transport datasets, however, often lack data integrity. Boarding stop information may be missing either due to imperfect acquirement processes or inadequate reporting. As a result, large quantities of observations and even complete sections of cities might be absent from the smart card database. We have developed a machine (supervised) learning method to impute missing boarding stops based on ordinal classification. In addition, we present a new metric, Pareto Accuracy, to evaluate algorithms where classes have an ordinal nature. Results are based on a case study in the city of Beer Sheva utilizing one month of data. We show that our proposed method significantly outperforms schedule-based imputation methods and can improve the accuracy and usefulness of large-scale transportation data. The implications for data imputation of smart card information is further discussed.
However, there is a more fundamental layer. Our study looks at new qualities of ads that can be determined before an ad is shown (rather than online optimization) and defines which ads are most likely to be successful.
We present a set of novel algorithms that utilize deep-learning image processing, machine learning, and graph theory to investigate online advertising and to construct prediction models which can foresee an image ad's success.
We evaluated our algorithms on a dataset with over 260,000 ad images, as well as a smaller dataset specifically related to the automotive industry, and we succeeded in constructing regression models for ad image click rate prediction.
The obtained results emphasize the great potential of using deep-learning algorithms to effectively and efficiently analyze image ads and to create better and more innovative online ads. Moreover, the algorithms presented in this paper can help predict ad success and can be applied to analyze other large-scale image corpora.
have essentially become an integral part of our daily lives. There
are hundreds of OSNs, each with its own focus and offers for
particular services and functionalities. To take advantage of the
full range of services and functionalities that OSNs offer, users
often create several accounts on various OSNs using the same or
different personal information. Retrieving all available data
about an individual from several OSNs and merging it into one
profile can be useful for many purposes. In this paper, we present
a method for solving the Entity Resolution (ER), problem for
matching user profiles across multiple OSNs. Our algorithm is
able to match two user profiles from two different OSNs based on
machine learning techniques, which uses features extracted from
each one of the user profiles. Using supervised learning
techniques and extracted features, we constructed different
classifiers, which were then trained and used to rank the
probability that two user profiles from two different OSNs
belong to the same individual. These classifiers utilized 27
features of mainly three types: name based features (i.e., the
Soundex value of two names), general user info based features
(i.e., the cosine similarity between two user profiles), and social
network topological based features (i.e., the number of mutual
friends between two users’ friends list). This experimental study
uses real-life data collected from two popular OSNs, Facebook
and Xing. The proposed algorithm was evaluated and its
classification performance measured by AUC was 0.982 in
identifying user profiles across two OSNs.
the data discussed in this paper connects transportation and
road safety research to location based services and social network platforms."
"
with at least one common friend. A new friends measure that
we developed is shown to be a good predictor for missing
links. An evaluation experiment was performed on five large
Social Networks datasets: Facebook, Flickr, YouTube, Academia and TheMarker. Our methods can provide social network site operators with the capability of helping users to find known, offline contacts and to discover new friends online. They may also be used for exposing hidden links in an online social network."
phone, its user, and its environment. A great deal of research effort in academia and
industry is put into mining this raw data for higher level sense-making, such as under-standing user context, inferring social networks, learning individual features, and so on.
In many cases, this analysis work is the result of exploratory forays and trial-and-error.
In this work we investigate the properties of learning and inferences of real world data
collected via mobile phones for different sizes of analyzed networks. In particular, we
examine how the ability to predict individual features and social links is incrementally
enhanced with the accumulation of additional data. To accomplish this, we use the
Friends and Family dataset, which contains rich data signals gathered from the
smartphones of 130 adult members of a young-family residential community over the
course of a year and consequently has become one of the most comprehensive mobile
phone datasets gathered in academia to date. Our results show that features such as
ethnicity, age and marital status can be detected by analyzing social and behavioral
signals. We then investigate how the prediction accuracy is increased when the users
sample set grows. Finally, we propose a method for advanced prediction of the maxi-mal learning accuracy possible for the learning task at hand, based on an initial set of
measurements. These predictions have practical implications, such as influencing the
design of mobile data collection campaigns or evaluating analysis strategies."
patterns in the presence of some anomalous “real world event”. We argue
that given limited analysis resources (namely, limited number of network edges
we can analyze), it is best to select edges that are located around ‘hubs’ in the
network. We demonstrate this method using a dataset containing the call log data
from a major mobile carrier in a European nation."
over the symmetric groups. Through the years this result was generalized in various ways to
signed permutation groups. In this paper we present several new generalizations, in particular, we study the
effect of different linear orders on the letters [−n, n] and generalize a classical result of Foata and Zeilberger
"
In this work we investigate the properties of learning and inference of real world data collected via mobile phones over time. In particular, we look at the dynamic learning process over time, and how the ability to predict individual parameters and social links is incrementally enhanced with the accumulation of additional data. To do this, we use the Friends and Family dataset, which contains rich data signals gathered from the smartphones of 140 adult members of a young-family residential community for over a year, and is one of the most comprehensive mobile phone datasets gathered in academia to date.
We develop several models that predict social and individual properties from sensed mobile phone data, including detection of life-partners, ethnicity, and whether a person is a student or not. Then, for this set of diverse learning tasks, we investigate how the prediction accuracy evolves over time, as new data is collected. Finally, based on gained insights, we propose a method for advance prediction of the maximal learning accuracy possible for the learning task at hand, based on an initial set of measurements. This has practical implications, like informing the design of mobile data collection campaigns, or evaluating analysis strategies."
terror organizations investigating the relationships between suspected individuals. Unfortunately, the data mined from open sources is usually far from being complete due to the efforts of suspected and known terrorists to hide their relationships. One
of the methods used to uncover missing information in social networks is referred to as link prediction. We use link prediction methods solely based on network struc-ture analysis to infer hidden relationships among individuals and investigate their
effectiveness in fractional datasets. Experiments performed on a number of closed communities extracted from organizational and public social networks show that structural link prediction retains its effectiveness even when large parts of the origi-nal social network are hidden."
(Draft Version)