HAL (Le Centre pour la Communication Scientifique Directe), Jun 21, 2022
Word embedding methods allow to represent words as vectors in a space that is structured using wo... more Word embedding methods allow to represent words as vectors in a space that is structured using word co-occurrences so that words with close meanings are close in this space. These vectors are then provided as input to automatic systems to solve natural language processing problems. Because interpretability is a necessary condition to trusting such systems, interpretability of embedding spaces, the first link in the chain is an important issue. In this paper, we thus evaluate the interpretability of vectors extracted with two approaches: SPINE, a k-sparse auto-encoder, and SINr, a graph-based method. This evaluation is based on a Word Intrusion Task with human annotators. It is operated using a large French corpus, and is thus, as far as we know, the first large-scale experiment regarding word embedding interpretability on this language. Furthermore, contrary to the approaches adopted in the literature where the evaluation is performed on a small sample of frequent words, we consider a more realistic use-case where most of the vocabulary is kept for the evaluation. This allows to show how difficult this task is, even though SPINE and SINr show some promising results. In particular, SINr results are obtained with a very low amount of computation compared to SPINE, while being similarly interpretable.
The influence of authors is mostly based on their capacity to form specific self-contained and/or... more The influence of authors is mostly based on their capacity to form specific self-contained and/or active research communities or topics while also inspiring fruitful spin-off research derived from those communities or topics. Accurately estimating author influence and its indirect effect in inspiring external research creativity must thus be based on very precise community roles as well as topic-role analysis on citing material. In this paper we thus implement, compare, and combine two different kinds of knowledge mapping approaches used to track authors’ influence over time—a recent topic-based mapping approach and an original hybrid community-based mapping approach. The experimental context of our study is the analysis of the dynamics and of the influence of the research of Prof. Liu Zeyuan, the most important contributor in the field of Science of Science in China. This analysis focuses on papers citing Professor Liu Zeyuan's research and highlights the full extent of his creative thinking.
Le sociologue Bourdieu définit le capital social comme : "L’ensemble des ressources actuelle... more Le sociologue Bourdieu définit le capital social comme : "L’ensemble des ressources actuelles ou potentielles qui sont liées à la possession d’un réseau durable de relations". Sur Twitter, les abonnements, mentions et retweets créent un réseau de relations pour chaque utilisateur dont les ressources sont l’obtention d’informations pertinentes, la possibilité d’être lu, d’assouvir un besoin narcissique, de diffuser efficacement des messages.Certains utilisateurs Twitter -appelés capitalistes sociaux - cherchent à maximiser leur nombre d’abonnements pour maximiser leur capital social. Nous introduisons leurs techniques, basées sur l’échange d’abonnements et l’utilisation de hashtags dédiés. Afin de mieux les étudier, nous détaillons tout d’abord une méthode pour détecter à l’échelle du réseau ces utilisateurs en se basant sur leurs abonnements et abonnés. Puis, nous montrons avec un compte Twitter automatisé que ces techniques permettent de gagner efficacement des abonnés et...
HAL (Le Centre pour la Communication Scientifique Directe), Jun 14, 2021
After the rise of Word2vec came the BERT era, with large architectures allowing to deal with poly... more After the rise of Word2vec came the BERT era, with large architectures allowing to deal with polysemy by taking into account the contextual information, leading to great performance improvement on classic nlp tasks. BERT systems are considered universal: they can be fine-tuned to address any task efficiently. However, these systems are huge to deploy, not trivial to fine-tune, and may not be fitted to some corpora, e.g. domain-specific and small ones. For instance, we consider the deft 2018 corpus of tweets and show that CamemBERT is not appropriate to this corpus and task. According to the Occam's razor principle, we thus designed MiniBERT, a tiny BERT architecture that includes a simplified self-attention mechanism and does require neither pre-training, nor external data. We show that this easily trainable and deployable system obtains encouraging results on deft, whilst providing interpretable results.
In this paper we consider the community detection problem from two different perspectives. We fir... more In this paper we consider the community detection problem from two different perspectives. We first want to be able to compute communities for large directed networks, containing million vertices and billion arcs. Moreover, in a large number of applications, the graphs modelizing such networks are directed. Nevertheless, one is often forced to forget the direction between the connections, either for the sake of simplicity or because no other options are available. This is in particular the case on large networks, since there are only a few scalable algorithms at the time. We thus turn our attention to one of the most famous scalable algorithms, namely Louvain's algorithm [3], based on modularity maximization. We modify Louvain's algorithm to handle directed networks based on the notion of directed modularity defined by Leicht and Newman [13], and provide an empirical and theoretical study to show that one should prefer directed modularity. To illustrate this fact, we use the LFR benchmarks by Lancichinetti and Fortunato [8] to design an evaluation benchmark of directed graphs with community structure. We also give some examples and insights on the situations where one should really consider direction when maximizing modularity. Finally, for the sake of completeness, we compare the results obtained with Oslom [12], one of the best algorithms to detect communities in directed networks. While the results obtained with such an algorithm are by far better on the LFR benchmarks, we emphasize that it is still not well-suited to deal with very large networks.
In this paper, we aim to give insights about the self-organization of scientific collaboration. T... more In this paper, we aim to give insights about the self-organization of scientific collaboration. To that aim, we describe a new framework to monitor the evolution of a collaboration graph that models the co-authorship of research papers authors. We use community structure of the network as a high-level description of its self-organization and thus consider the evolution of the communities across time. To monitor this evolution, we describe a diachronic analysis method based on the extraction of prevalent nodes for each community. We apply this approach on data issued from the ISTEX project, a scientific digital library that contains so far more than 16 million documents and present some preliminary results and visualizations.
Introduced in the context of machine learning, the Feature F-measure is a statistical feature sel... more Introduced in the context of machine learning, the Feature F-measure is a statistical feature selection metric without parameters that allows to describe classes through a set of salient features. It was shown efficient for classification, cluster labeling and clustering model quality measurement. In this paper, we introduce the Node F-measure, its transposition in the context of networks, where it can by analogy be applied to detect salient nodes in communities. This approach benefits from the parameter-free system of Feature F-Measure, its low computational complexity and its well-evaluated performance. Interestingly, we show that in addition to these properties, Node F-measure is correlated with certain centrality measures, and with measures designed to characterize the community roles of nodes. We also observe that the usual community roles measures are strongly dependent from the size of the communities whereas the ones we propose are by definition linked to the density of the community. This hence makes their results comparable from one network to another. Finally, the parameter-free selection process applied to nodes allows for a universal system, contrary to the thresholds previously defined empirically for the establishment of community roles. These results may have applications regarding leadership in scientific communities or when considering temporal monitoring of communities.
This software aims at studying social capitalists, which are a specific type of users of social n... more This software aims at studying social capitalists, which are a specific type of users of social networks services such as Twitter. The tool is generic, so it can actually be applied to completely different systems, as long as they can be represented as directed networks (i.e. digraphs). We applied our tool to Twitter in several research papers (see readme file). To be more precise, the tool allows to perform the following tasks, on a network represented as an edgelist (the exact format is described later): Process social capitalism indices, which allows identifying social capitalists, as well as the type of strategy they use to improve their visibility on the network; Detect the community structure of the considered network, through various existing algorithms; Process various kinds of community role measures, aiming at describing the position of each node in the network from a community perspective (i.e. meso-scale); Perform a cluster analysis of these community role values, in ord...
The use of general descriptive names, registered names, trademarks, service marks, etc. in this p... more The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
We present a pretopological based approach to extract ego-centered communities. Classical methods... more We present a pretopological based approach to extract ego-centered communities. Classical methods often consider only one structural feature of the network, whereas pretopology enables to do multi-criteria analysis. Our approach consists in learning a logical combination of network’s descriptors to define a pretopological space. Ego-centered communities are extracted by computing the elementary closure of each node. The quality of such communities is evaluated against the ground truth communities. We show the benefits of our method by comparing it to others on both real and synthetic networks.
Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, 2015
On Twitter, social capitalists use dedicated hashtags and mutual subscriptions to each other in o... more On Twitter, social capitalists use dedicated hashtags and mutual subscriptions to each other in order to gain followers and to be retweeted. Their methods are successful enough to make them appear as influent users. Indeed, applications dedicated to the influence measurement such as Klout and Kred give high scores to most of these users. Meanwhile, their high number of retweets and followers are not due to the relevance of the content they tweet, but to their social capitalism techniques. In order to be able to detect these users, we train a classifier using a dataset of social capitalists and regular users. We then implement this classifier in a web application that we call DDP. DDP allows users to test whether a Twitter account is a social capitalist or not and to visualize the data we use to make the prediction. DDP allows administrator to crawl data from a lot of users automatically. Furthermore, administrators can manually label Twitter accounts as social capitalists or regular users to add them into the dataset. Finally, administrators can train new classifiers in order to take into account the new Twitter accounts added to the dataset, and thus making evolve the classifier with these new recently collected data. The web application is thus a way to collect data, make evolve the knowledge about social capitalists and to keep detecting them efficiently.
2015 Second European Network Intelligence Conference, 2015
In this paper, we investigate the issue of detecting the real-life influence of people based on t... more In this paper, we investigate the issue of detecting the real-life influence of people based on their Twitter account. We propose an overview of common Twitter features used to characterize such accounts and their activity, and show that these are inefficient in this context. In particular, retweets and followers numbers, and Klout score are not relevant to our analysis. We thus propose several Machine Learning approaches based on Natural Language Processing and Social Network Analysis to label Twitter users as Influencers or not. We also rank them according to a predicted influence level. Our proposals are evaluated over the CLEF RepLab 2014 dataset, and outmatch state-of-the-art ranking methods.
In the context of Twitter, social capitalists are specific users trying to increase their number ... more In the context of Twitter, social capitalists are specific users trying to increase their number of followers and interactions by any means. These users are not healthy for the service, because they are either spammers or real users flawing the notions of influence and visibility. Studying their behavior and understanding their position in Twitter is thus of important interest. It is also necessary to analyze how these methods effectively affect user visibility. Based on a recently proposed method allowing to identify social capitalists, we tackle both points by studying how they are organized, and how their links spread across the Twitter follower-followee network. To that aim, we consider their position in the network w.r.t. its community structure. We use the concept of community role of a node, which describes its position in a network depending on its connectivity at the community level. However, the topological measures originally defined to characterize these roles consider only certain aspects of the community-related connectivity, and rely on a set of empirically fixed thresholds. We first show the limitations of these measures, before extending and generalizing them. Moreover, we use an unsupervised approach to identify the roles, in order to provide more flexibility relatively to the studied system. We then apply our method to the case of social capitalists and show they are highly visible on Twitter, due to the specific roles they hold.
Les réseaux sociaux tels que Twitter et Facebook sont partie prenante du phénomène de déluge des ... more Les réseaux sociaux tels que Twitter et Facebook sont partie prenante du phénomène de déluge des données. Les graphes modélisant leurs utilisateurs et les liens existant entre eux représentent des dizaines de millions de sommets et plusieurs milliards d'arcs. Les traiter efficacement pour en analyser la topologie reste un challenge actuellement. Dans cet article, nous proposons une solution pour traiter de tels graphes, et nous intéressons également à la détection et à l'analyse d'une communauté particulière d'utilisateurs de Twitter appelés capitalistes sociaux. ABSTRACT. Social networks such as Twitter or Facebook are part of the phenomenon called Big Data. Graphs modelizing their users and the links between them represent dozens of millions of vertices and billions of arcs. Being able to consider them efficiently in order to analyse their topology constitutes a major challenge. In this extented abstract, we propose a solution to deal with such graphs, and we study the detection and the characterization of a small community of users of Twitter, called social capitalists.
Les capitalistes sociaux sont des utilisateurs de médias sociaux tels que Twitter, appliquant div... more Les capitalistes sociaux sont des utilisateurs de médias sociaux tels que Twitter, appliquant diverses techniques pour obtenir un maximum de visibilité. Ils peuvent être néfastes à l'équilibre du service, dans la mesure où leurs comptes, en gagnant en importance sans réelle raison de contenu, rendent difficile l'accès à un contenu pertinent. Dans ce travail, nous nous intéressons à leur caractérisation d'un point de vue purement topologique, i.e. sans considérer la nature des contenus partagés. Nous utilisons pour cela la notion de rôle communautaire, qui est basée sur la structure de communautés du réseau étudié. Nous apportons des modifications à des mesures précédemment définies à cet effet, et proposons une méthode objective de détection des rôles. Nous appliquons ensuite notre méthode à l'analyse d'un réseau représentant Twitter. Nos résultats montrent que les rôles que nous identifions via nos mesures se révèlent particulièrement cohérents par rapport aux capitalistes sociaux du réseau Twitter, dont le comportement est clairement identifié. ABSTRACT. Social capitalists are social media users taking advantage of various methods to maximize their visibility. This results in artificially important accounts, in the sense this importance is not backed by any real content. The risk is then to see those accounts hiding relevant contents and therefore preventing other users to access them. In this work, we want to characterize social capitalists from a purely topological perspective, i.e. without considering the nature of the shared contents. For this purpose, we use the notion of community role, based on the community structure of the studied network. We modify some measures previously designed for this matter, and propose an objective method to determine roles. We then apply this method to the analysis of a Twitter network. Our results show the roles identified through our measures are particularly consistent with Twitter's social capitalists, whose behavior was clearly identified.
In this paper, we focus on the detection and behavior of social capitalists, a special kind of us... more In this paper, we focus on the detection and behavior of social capitalists, a special kind of users in Twitter. Roughly speaking, social capitalists follow users regardless of their contents, just hoping to increase their number of followers. They have first been introduced by Ghosh et al. (Proceedings of the 21st international conference on World Wide Web, WWW'12, pp. 61-70, 2012). In this work, we provide a method to detect these users efficiently. Our algorithms do not rely on the tweets posted by the users, just on the topology of the Twitter graph. Then, we show that these users form a highly connected group in the network by studying their neighborhoods and their local clustering coefficient (Watts and Strogatz, Nature 393 (6684):440-442, 1998). We next study the evolution of such users between 2009 and 2013. Finally, we provide a behavioral analysis based on social capitalists that tweet on a special hashtag. Our work emphasizes that such users, who act like automatic accounts, are in fact for most of them real users.
Social networks such as Twitter or Facebook are part of the phenomenon called Big Data, a term us... more Social networks such as Twitter or Facebook are part of the phenomenon called Big Data, a term used to describe very large and complex data sets. To represent these networks, the connections between users can be easily represented using (directed) graphs. In this paper, we are mainly focused on two different aspects of social network analysis. First, our goal is to find an efficient and high-level way to store and process a social network graph, using reasonable computing resources (processor and memory). We believe that this is an important research interest, since it provides a more democratic method to deal with large graphs. Next, we turn our attention to the study of social capitalists, a specific kind of users on Twitter. Roughly speaking, such users try to gain visibility by following other users regardless of their contents. Using two similarity measures called overlap index and ratio, we show that such users may be detected and classified very efficiently.
HAL (Le Centre pour la Communication Scientifique Directe), Jun 21, 2022
Word embedding methods allow to represent words as vectors in a space that is structured using wo... more Word embedding methods allow to represent words as vectors in a space that is structured using word co-occurrences so that words with close meanings are close in this space. These vectors are then provided as input to automatic systems to solve natural language processing problems. Because interpretability is a necessary condition to trusting such systems, interpretability of embedding spaces, the first link in the chain is an important issue. In this paper, we thus evaluate the interpretability of vectors extracted with two approaches: SPINE, a k-sparse auto-encoder, and SINr, a graph-based method. This evaluation is based on a Word Intrusion Task with human annotators. It is operated using a large French corpus, and is thus, as far as we know, the first large-scale experiment regarding word embedding interpretability on this language. Furthermore, contrary to the approaches adopted in the literature where the evaluation is performed on a small sample of frequent words, we consider a more realistic use-case where most of the vocabulary is kept for the evaluation. This allows to show how difficult this task is, even though SPINE and SINr show some promising results. In particular, SINr results are obtained with a very low amount of computation compared to SPINE, while being similarly interpretable.
The influence of authors is mostly based on their capacity to form specific self-contained and/or... more The influence of authors is mostly based on their capacity to form specific self-contained and/or active research communities or topics while also inspiring fruitful spin-off research derived from those communities or topics. Accurately estimating author influence and its indirect effect in inspiring external research creativity must thus be based on very precise community roles as well as topic-role analysis on citing material. In this paper we thus implement, compare, and combine two different kinds of knowledge mapping approaches used to track authors’ influence over time—a recent topic-based mapping approach and an original hybrid community-based mapping approach. The experimental context of our study is the analysis of the dynamics and of the influence of the research of Prof. Liu Zeyuan, the most important contributor in the field of Science of Science in China. This analysis focuses on papers citing Professor Liu Zeyuan's research and highlights the full extent of his creative thinking.
Le sociologue Bourdieu définit le capital social comme : "L’ensemble des ressources actuelle... more Le sociologue Bourdieu définit le capital social comme : "L’ensemble des ressources actuelles ou potentielles qui sont liées à la possession d’un réseau durable de relations". Sur Twitter, les abonnements, mentions et retweets créent un réseau de relations pour chaque utilisateur dont les ressources sont l’obtention d’informations pertinentes, la possibilité d’être lu, d’assouvir un besoin narcissique, de diffuser efficacement des messages.Certains utilisateurs Twitter -appelés capitalistes sociaux - cherchent à maximiser leur nombre d’abonnements pour maximiser leur capital social. Nous introduisons leurs techniques, basées sur l’échange d’abonnements et l’utilisation de hashtags dédiés. Afin de mieux les étudier, nous détaillons tout d’abord une méthode pour détecter à l’échelle du réseau ces utilisateurs en se basant sur leurs abonnements et abonnés. Puis, nous montrons avec un compte Twitter automatisé que ces techniques permettent de gagner efficacement des abonnés et...
HAL (Le Centre pour la Communication Scientifique Directe), Jun 14, 2021
After the rise of Word2vec came the BERT era, with large architectures allowing to deal with poly... more After the rise of Word2vec came the BERT era, with large architectures allowing to deal with polysemy by taking into account the contextual information, leading to great performance improvement on classic nlp tasks. BERT systems are considered universal: they can be fine-tuned to address any task efficiently. However, these systems are huge to deploy, not trivial to fine-tune, and may not be fitted to some corpora, e.g. domain-specific and small ones. For instance, we consider the deft 2018 corpus of tweets and show that CamemBERT is not appropriate to this corpus and task. According to the Occam's razor principle, we thus designed MiniBERT, a tiny BERT architecture that includes a simplified self-attention mechanism and does require neither pre-training, nor external data. We show that this easily trainable and deployable system obtains encouraging results on deft, whilst providing interpretable results.
In this paper we consider the community detection problem from two different perspectives. We fir... more In this paper we consider the community detection problem from two different perspectives. We first want to be able to compute communities for large directed networks, containing million vertices and billion arcs. Moreover, in a large number of applications, the graphs modelizing such networks are directed. Nevertheless, one is often forced to forget the direction between the connections, either for the sake of simplicity or because no other options are available. This is in particular the case on large networks, since there are only a few scalable algorithms at the time. We thus turn our attention to one of the most famous scalable algorithms, namely Louvain's algorithm [3], based on modularity maximization. We modify Louvain's algorithm to handle directed networks based on the notion of directed modularity defined by Leicht and Newman [13], and provide an empirical and theoretical study to show that one should prefer directed modularity. To illustrate this fact, we use the LFR benchmarks by Lancichinetti and Fortunato [8] to design an evaluation benchmark of directed graphs with community structure. We also give some examples and insights on the situations where one should really consider direction when maximizing modularity. Finally, for the sake of completeness, we compare the results obtained with Oslom [12], one of the best algorithms to detect communities in directed networks. While the results obtained with such an algorithm are by far better on the LFR benchmarks, we emphasize that it is still not well-suited to deal with very large networks.
In this paper, we aim to give insights about the self-organization of scientific collaboration. T... more In this paper, we aim to give insights about the self-organization of scientific collaboration. To that aim, we describe a new framework to monitor the evolution of a collaboration graph that models the co-authorship of research papers authors. We use community structure of the network as a high-level description of its self-organization and thus consider the evolution of the communities across time. To monitor this evolution, we describe a diachronic analysis method based on the extraction of prevalent nodes for each community. We apply this approach on data issued from the ISTEX project, a scientific digital library that contains so far more than 16 million documents and present some preliminary results and visualizations.
Introduced in the context of machine learning, the Feature F-measure is a statistical feature sel... more Introduced in the context of machine learning, the Feature F-measure is a statistical feature selection metric without parameters that allows to describe classes through a set of salient features. It was shown efficient for classification, cluster labeling and clustering model quality measurement. In this paper, we introduce the Node F-measure, its transposition in the context of networks, where it can by analogy be applied to detect salient nodes in communities. This approach benefits from the parameter-free system of Feature F-Measure, its low computational complexity and its well-evaluated performance. Interestingly, we show that in addition to these properties, Node F-measure is correlated with certain centrality measures, and with measures designed to characterize the community roles of nodes. We also observe that the usual community roles measures are strongly dependent from the size of the communities whereas the ones we propose are by definition linked to the density of the community. This hence makes their results comparable from one network to another. Finally, the parameter-free selection process applied to nodes allows for a universal system, contrary to the thresholds previously defined empirically for the establishment of community roles. These results may have applications regarding leadership in scientific communities or when considering temporal monitoring of communities.
This software aims at studying social capitalists, which are a specific type of users of social n... more This software aims at studying social capitalists, which are a specific type of users of social networks services such as Twitter. The tool is generic, so it can actually be applied to completely different systems, as long as they can be represented as directed networks (i.e. digraphs). We applied our tool to Twitter in several research papers (see readme file). To be more precise, the tool allows to perform the following tasks, on a network represented as an edgelist (the exact format is described later): Process social capitalism indices, which allows identifying social capitalists, as well as the type of strategy they use to improve their visibility on the network; Detect the community structure of the considered network, through various existing algorithms; Process various kinds of community role measures, aiming at describing the position of each node in the network from a community perspective (i.e. meso-scale); Perform a cluster analysis of these community role values, in ord...
The use of general descriptive names, registered names, trademarks, service marks, etc. in this p... more The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
We present a pretopological based approach to extract ego-centered communities. Classical methods... more We present a pretopological based approach to extract ego-centered communities. Classical methods often consider only one structural feature of the network, whereas pretopology enables to do multi-criteria analysis. Our approach consists in learning a logical combination of network’s descriptors to define a pretopological space. Ego-centered communities are extracted by computing the elementary closure of each node. The quality of such communities is evaluated against the ground truth communities. We show the benefits of our method by comparing it to others on both real and synthetic networks.
Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, 2015
On Twitter, social capitalists use dedicated hashtags and mutual subscriptions to each other in o... more On Twitter, social capitalists use dedicated hashtags and mutual subscriptions to each other in order to gain followers and to be retweeted. Their methods are successful enough to make them appear as influent users. Indeed, applications dedicated to the influence measurement such as Klout and Kred give high scores to most of these users. Meanwhile, their high number of retweets and followers are not due to the relevance of the content they tweet, but to their social capitalism techniques. In order to be able to detect these users, we train a classifier using a dataset of social capitalists and regular users. We then implement this classifier in a web application that we call DDP. DDP allows users to test whether a Twitter account is a social capitalist or not and to visualize the data we use to make the prediction. DDP allows administrator to crawl data from a lot of users automatically. Furthermore, administrators can manually label Twitter accounts as social capitalists or regular users to add them into the dataset. Finally, administrators can train new classifiers in order to take into account the new Twitter accounts added to the dataset, and thus making evolve the classifier with these new recently collected data. The web application is thus a way to collect data, make evolve the knowledge about social capitalists and to keep detecting them efficiently.
2015 Second European Network Intelligence Conference, 2015
In this paper, we investigate the issue of detecting the real-life influence of people based on t... more In this paper, we investigate the issue of detecting the real-life influence of people based on their Twitter account. We propose an overview of common Twitter features used to characterize such accounts and their activity, and show that these are inefficient in this context. In particular, retweets and followers numbers, and Klout score are not relevant to our analysis. We thus propose several Machine Learning approaches based on Natural Language Processing and Social Network Analysis to label Twitter users as Influencers or not. We also rank them according to a predicted influence level. Our proposals are evaluated over the CLEF RepLab 2014 dataset, and outmatch state-of-the-art ranking methods.
In the context of Twitter, social capitalists are specific users trying to increase their number ... more In the context of Twitter, social capitalists are specific users trying to increase their number of followers and interactions by any means. These users are not healthy for the service, because they are either spammers or real users flawing the notions of influence and visibility. Studying their behavior and understanding their position in Twitter is thus of important interest. It is also necessary to analyze how these methods effectively affect user visibility. Based on a recently proposed method allowing to identify social capitalists, we tackle both points by studying how they are organized, and how their links spread across the Twitter follower-followee network. To that aim, we consider their position in the network w.r.t. its community structure. We use the concept of community role of a node, which describes its position in a network depending on its connectivity at the community level. However, the topological measures originally defined to characterize these roles consider only certain aspects of the community-related connectivity, and rely on a set of empirically fixed thresholds. We first show the limitations of these measures, before extending and generalizing them. Moreover, we use an unsupervised approach to identify the roles, in order to provide more flexibility relatively to the studied system. We then apply our method to the case of social capitalists and show they are highly visible on Twitter, due to the specific roles they hold.
Les réseaux sociaux tels que Twitter et Facebook sont partie prenante du phénomène de déluge des ... more Les réseaux sociaux tels que Twitter et Facebook sont partie prenante du phénomène de déluge des données. Les graphes modélisant leurs utilisateurs et les liens existant entre eux représentent des dizaines de millions de sommets et plusieurs milliards d'arcs. Les traiter efficacement pour en analyser la topologie reste un challenge actuellement. Dans cet article, nous proposons une solution pour traiter de tels graphes, et nous intéressons également à la détection et à l'analyse d'une communauté particulière d'utilisateurs de Twitter appelés capitalistes sociaux. ABSTRACT. Social networks such as Twitter or Facebook are part of the phenomenon called Big Data. Graphs modelizing their users and the links between them represent dozens of millions of vertices and billions of arcs. Being able to consider them efficiently in order to analyse their topology constitutes a major challenge. In this extented abstract, we propose a solution to deal with such graphs, and we study the detection and the characterization of a small community of users of Twitter, called social capitalists.
Les capitalistes sociaux sont des utilisateurs de médias sociaux tels que Twitter, appliquant div... more Les capitalistes sociaux sont des utilisateurs de médias sociaux tels que Twitter, appliquant diverses techniques pour obtenir un maximum de visibilité. Ils peuvent être néfastes à l'équilibre du service, dans la mesure où leurs comptes, en gagnant en importance sans réelle raison de contenu, rendent difficile l'accès à un contenu pertinent. Dans ce travail, nous nous intéressons à leur caractérisation d'un point de vue purement topologique, i.e. sans considérer la nature des contenus partagés. Nous utilisons pour cela la notion de rôle communautaire, qui est basée sur la structure de communautés du réseau étudié. Nous apportons des modifications à des mesures précédemment définies à cet effet, et proposons une méthode objective de détection des rôles. Nous appliquons ensuite notre méthode à l'analyse d'un réseau représentant Twitter. Nos résultats montrent que les rôles que nous identifions via nos mesures se révèlent particulièrement cohérents par rapport aux capitalistes sociaux du réseau Twitter, dont le comportement est clairement identifié. ABSTRACT. Social capitalists are social media users taking advantage of various methods to maximize their visibility. This results in artificially important accounts, in the sense this importance is not backed by any real content. The risk is then to see those accounts hiding relevant contents and therefore preventing other users to access them. In this work, we want to characterize social capitalists from a purely topological perspective, i.e. without considering the nature of the shared contents. For this purpose, we use the notion of community role, based on the community structure of the studied network. We modify some measures previously designed for this matter, and propose an objective method to determine roles. We then apply this method to the analysis of a Twitter network. Our results show the roles identified through our measures are particularly consistent with Twitter's social capitalists, whose behavior was clearly identified.
In this paper, we focus on the detection and behavior of social capitalists, a special kind of us... more In this paper, we focus on the detection and behavior of social capitalists, a special kind of users in Twitter. Roughly speaking, social capitalists follow users regardless of their contents, just hoping to increase their number of followers. They have first been introduced by Ghosh et al. (Proceedings of the 21st international conference on World Wide Web, WWW'12, pp. 61-70, 2012). In this work, we provide a method to detect these users efficiently. Our algorithms do not rely on the tweets posted by the users, just on the topology of the Twitter graph. Then, we show that these users form a highly connected group in the network by studying their neighborhoods and their local clustering coefficient (Watts and Strogatz, Nature 393 (6684):440-442, 1998). We next study the evolution of such users between 2009 and 2013. Finally, we provide a behavioral analysis based on social capitalists that tweet on a special hashtag. Our work emphasizes that such users, who act like automatic accounts, are in fact for most of them real users.
Social networks such as Twitter or Facebook are part of the phenomenon called Big Data, a term us... more Social networks such as Twitter or Facebook are part of the phenomenon called Big Data, a term used to describe very large and complex data sets. To represent these networks, the connections between users can be easily represented using (directed) graphs. In this paper, we are mainly focused on two different aspects of social network analysis. First, our goal is to find an efficient and high-level way to store and process a social network graph, using reasonable computing resources (processor and memory). We believe that this is an important research interest, since it provides a more democratic method to deal with large graphs. Next, we turn our attention to the study of social capitalists, a specific kind of users on Twitter. Roughly speaking, such users try to gain visibility by following other users regardless of their contents. Using two similarity measures called overlap index and ratio, we show that such users may be detected and classified very efficiently.
Uploads
Papers by Nicolas Dugué