A close look at the Gephi user community

Last December we asked Gephi users to participate in a survey. The survey’s main objective was to better understand who users are and what kind of projects they work on. One important dimension we wanted to explore was the diversity of the user community. Through the projects we’ve seen in research and on the web we knew that Gephi users were diverse, but we wanted to quantify it. Ultimately, we aim to make the tool better so it supports users’ needs, but this is a process that requires first a good understanding of who the audience is and what are their objectives. Below we summarized our findings about the profile of users, the types of networks they work with and finally useful usage statistics the community can reflect on.

Profile

The largest share of Gephi users work in academia. The project started in the academic sphere from where it has spread into business, artistic and non-profits domains as well. Working at a profit organization is the second most common occupation, which confirms that network analysis is no longer reserved to scientists.

surveyq12

Q12. What is your occupation? n=285; multiple choice

Given that the largest group of users works in academia, it is not surprising that the most common title among Gephi users is a researcher.

surveyq14

Q14. What is your title? n=285; multiple choice

The user community is also widely spread around the world. Users from 46 different countries participated in this study. This confirms the importance of localization for as many languages as possible (Gephi currently supports eight). While many countries were represented by only a handful number of participants in the study, large concentration of users is, as expected, in the US (23%) and in France (15%). Significant presence in France is predetermined by Gephi’s presence in universities and businesses within which Gephi was originally founded.

Networks

Social networks are by far the most commonly analyzed type of networks when using Gephi. 70% say that they typically analyze social networks when using Gephi. Social media and semantic network analysis are also common and typically analyzed by 46% and 43%, respectively. The rest of the networks are less common with ecological network analyzed by about 5% of users.

Despite SNA (Social Network Analysis) being the dominant use there is a large variety of other use as well. That said, networks can be analyzed only if the data are accessible and we (the community) still have work to do to ease network collection and formatting.

We always wondered if given occupations are more likely to work with specific types of networks. Based on this study, some differences exist, but they are not as prominent as we have expected. We found that people working at profit organizations are more likely to use Gephi to analyze business and financial networks. While in total 24% use Gephi to analyze business network, it is 44% among those who work in a profit organization compared to only 12% among those who do not work in a profit company. Differences for other types of networks were not conclusive.

surveyq5

Q5. What type(s) of network do you typically analyze using Gephi? n=285; multiple choice

Gephi users commonly deal with a wide range of network sizes. Although the typical network has between 100 to 10K nodes, every size from <100 nodes to 1M nodes represent at least 10% of users. In total that is more than 5 orders of magnitude difference in data size, and without taking edges in consideration!

surveyq6q7

Q6. What is/are the graph size(s) you deal with when working with Gephi? n=285; multiple choice
Q7. And what is the TYPICAL size of a graph that you manipulate with Gephi? n=285; single choice

While more than half of Gephi users have never used Gephi to analyse dynamic networks, the vast majority of the community is likely to use it in the future.  This confirms the importance of the set of features related to dynamic networks that has long been one of Gephi’s primary focus.

surveyq8

Q8. Have you ever used Gephi to work with dynamic networks (networks over time)? n=285; single choice

surveyq9

Q9. How likely are you to use Gephi to analyze dynamic networks (networks over time) in the future? n=285; single choice

Usage

Both online and offline sources are important touch points through which people learn about Gephi for the first time. While web search is the most common way how people find Gephi, word of mouth remains an important channel and is not to be underestimated.

surveyq2

Q2. How did you first learn about Gephi? n=285, single choice

The community is very diverse when it comes to usage frequency which suggests that Gephi users are likely to have diverse needs. Occasional users are likely to have different expectations from a software than regular users.  About one third uses Gephi at least once a week which confirms that there is a relatively large base of heavy users who use Gephi regularly.

surveyq3

Q3. On average, how often do you use Gephi? n=285; single choice

Online tutorials and online forums are key sources for users to learn about Gephi. This confirms the importance of creating and updating online tutorials. It also suggests that the community is well engaged to be able to provide answers one another on online forums and groups.

surveyq4

Q4. What source(s) have you used/are you using to learn how to use Gephi? n=285, multiple choice

Conclusion

This survey is a first, yet important step in understanding the Gephi user community at large. It also gives a general overview of the network visualization and analytics field and we hope this can be useful for others as well. But for us – the Gephi leadership team – this will help us in our future community management efforts. It will also help design a better tool in the future as we better understand its user community.

In addition, talking about what kinds of projects users work on also helps shape the understanding of what network analytics is used for, and ultimately bring more people to the community. In the near future we want to double-down on this topic and start a series of articles highlighting the most interesting projects. Many of the respondents indicated their willingness to share what they have worked on so there’s already plenty to choose from.

Finally, to reflect on the diversity of users we believe it simply reflects that networks are everywhere. Analyzing networks bring insights and answers to many different problems.

separator

Appendix
  • Survey was conducted among Gephi users community. While the results provide a unique view into the Gephi community it is important to clarify that they are not meant to be representative of the entire community world wide.
  • The survey invitations were distributed throughout the week of Dec 1st 2015 via email, Twitter and Facebook
  • Final data set contains responses collected between Dec 1st 2015 and Dec 23rd 2015
  • A total of 285 participants completed the survey

Scientific graphs Generators plugin

Cezary Bartosiak and Rafał Kasprzyk just released the Complex Generators plugin, introducing many awaited scientific generators. These generators are extremely useful for scientists, as they help to simulate various real networks. They can test their models and algorithms on well-studied graph examples. For instance, the Watts-Strogatz generator creates networks as described by Duncan Watts in his Six Degrees book.

The plugin contains the following generators:

  • Balanced Tree
  • Barabasi Albert
  • Barabasi Albert Generalized
  • Barabasi Albert Simplified A
  • Barabasi Albert Simplified B
  • Erdos Renyi Gnm
  • Erdos Renyi Gnp
  • Kleinberg
  • Watts Strogatz Alpha
  • Watts Strogatz Beta

The plug-in can be installed directly from Gephi 0.8, from the Plugins menu.

The source code is available on Launchpad.

Book store: Theory & Practice

Gephi has now its own book store!

It’s a great place for those who want to discover the key theories beyond networks. It has also an “Information Visualization” and “Programming” section for those who want to master the subject and join the Gephi team. All these books give valuable information for understanding what is guiding the people who are developing Gephi and how concepts were put in practice.

The Network Science section refers to the science beyond networks. It describes where networks are in nature, society or organizations and helps understand their properties and patterns. Newcomers can starts with Linked by Albert-Laslo Barabasi, the major reference, from 2001. You can also directly jump to Bursts, Barabasi’s new book released few days ago.

Social network theory views a network as actors who are connected by a set of relationships and is referenced as Social Network Analysis (SNA). As people increasingly use social networking websites (e.g. Facebook, YouTube, LinkedIn etc.), Social Network Analysis brings tools to study patterns of communication and communities. Social Network Analysis by Wasserman & Faust is a major reference.

These books are for all audience, so researchers would find a clear state of the art of the domain with The Structure and Dynamics of Networks or Dynamical Processes of on Complex Networks.

 

Data Visualization and Human Computer interaction (HCI) are at the base of Gephi. Learn about how visualization and interaction enhance understanding and knowledge discovery of complex data. Information Visualization or Visual Analytics make reference to this domain as well.

One can easily find the roots of the Visual Analytics in the book Readings in Information Visualization: Using Vision to Think by Stuart Card, Jock Mackinlayis and Ben Shneiderman. Exploratory Data Analysis started with John Tukey, and has recently been extended by Andrienko.

The last stone is added with the knowledge of efficient programming, in particular how to design a modular software based on services with Practical API Design: Confessions of a Java Framework Architect. And as the human factor is central, take a look at The Mythical Man-Month: Essays on Software Engineering. Mathieu is a great fan 😉

 

Also we foster you to go beyond with more references at the Reader’s circle and of course send us book suggestion.

Future Internet and Society: A Complex Systems Perspective

The European Science Foundation (ESF), in partnership with COST is organising the following conference:

Future Internet and Society: A Complex Systems Perspective

Hotel Villa del Mare, Acquafredda di Maratea, Italy
2-7 October 2010

Chair: Romualdo Pastor-Satorras – Departament de Física i Enginyeria Nuclear (FIB), Universitat Politècnica de Catalunya, ES
Co-Chair: Claudio Castellano – CNR-ISC (Istituto dei Sistemi Complessi) and Dipartimento di Fisica, Sapienza Universitá di Roma, IT

The digital revolution and the advent of the Internet are transforming the way we work, how we spend our free time. These phenomena are also changing how we communicate with each other and the way in which we establish and maintain our social relations. The relationship between Internet and society is complex and bidirectional, leading to a co-evolution of the two systems. In fact, the Internet exists because humans need networking and the Internet evolution is ultimately driven by our ever-increasing use of it.
Continue reading →

Gephi initiator interview: how “Semiotics matter”

Today I have the honnor to interview a special member of Gephi Team: Mathieu Jacomy.

Mathieu is an engineer, a founder of the WebAtlas NGO, teacher in Sciences Po Paris, and leads R&D in the TIC Migrations program in the Fondation Maison des Sciences de l’Homme and Telecom ParisTech school.
He is the main developer of the “Navicrawler” software. He also created the first Gephi prototype.

 

heymann2_8080
Sebastien Heymann: Hi Mathieu Jacomy, you are the creator of Graphiltre, the first Gephi prototype that you developed in 2006. What was the purpose of making a yet-another-graph-software?
jacomy8080
Mathieu Jacomy: Hi ! I’m glad to answer your questions, and I hope our readers will be pleased to know more about Gephi.

At this time I was analyzing a lot of graphs and I wasn’t satisfied by the existing free tools. That’s why I started to build my own tools.

I had no money to use professional tools, and I needed to understand precisely what the software was doing : the open source, free softwares perfectly fit these constrains.
I was using the amazing software Guess proposed by Eytan Adar, that himself built for his own needs. I was doing quite the same thing as him, and I couldn’t start to explore graphs without this tool.
But I wasn’t satisfied because the software didn’t allow so much manipulations. I couldn’t look at the substructures as easily as I wanted, and it was difficult to make nice cartographies.
I was dreaming of a “graph-dedicated Photoshop“, a visualization-oriented software rather than a script-oriented tool.

A good way to figure out what I mean is to look at the spatialization process. In famous softwares such as Pajek or Guess, you have algorithms called “layout”, “force-vectors” or “energy model”. These algorithms give its shape to the graph, and it is probably the most critical part of the process to build a clear visualization. Because the substructures or “patterns” that one may see in the image strongly depend on the algorithm and the settings chosen. But in the same time, most of users also want to quickly look at the global shape of the graph, and may not be aware that it’s important to search for the best algorithm to use depending on the time you have, the quality you want, the size of the graph, its degree distribution, the substructure that you expect to recognize… I was careful with these algorithms but even if I understood their principles and specificities, I couldn’t figure out how they were transforming the graph, and I couldn’t evaluate their differences.

Why? Because in these softwares you can’t :
– Manipulate the graph while the algorithm is running
– Modify the settings while the algorithm is running
– And sometimes, you can’t event see the graph while the algorithm is running
How can you just understand what’s happening there? Of course I started to work on a software that allowed this. But the same kind of problems appears again in other parts of the process, like filtering, image exporting… Pajek is clearly built in a mathematical perspective. Guess is more user-friendly, but not enough. I didn’t want a tool for mathematics experts, but a tool for people that actually have to explore and understand graphs. A professional tool for a job that didn’t exist at this time.

This was the starting point of “Graphiltre“. Building a graph exploration system so that you can understand what you are doing by looking at what happens on the screen, and do anything (including filtering) without typing a single script line.

Continue reading →

Diseasome, explore the human disease network

DiseasomeGephi team presents today a science-mapping project: Diseasome. Asked by Magali Roux, Senior Scientist at CNRS, to create a website to come with the publication of her book, Biology – The digital era, we worked on the “Human Disease Network” dataset and built a network exploration platform.

“On a unique place, one can find information about the book, the dataset related to the writings, an online data exploration framework and the file to manipulate these data with Gephi.”

The HDN (Human Disease Network) and the GDN (Gene Disease Network) were extracted from the original dataset and treated with Gephi. From the results, an interactive map has been created with the help of RTGI/Linkfluence tools. A poster is also available, with the full network and some useful statistics.

Although this work is experimental, we hope it can help scientists to explore and search in this complexity. The Diseasome is above all an innovative way to present a scientific work. The importance of complex data in science and particularly network graphs brings a lot of challenges. As well as computational issues, many things can be done with graphic design and interaction.

Explore the Diseasome

[nggallery id=3]