Source code and datasets for the paper [Graphlets over Time: A New Lens for Temporal Graph Analysis]
The contribution of this work is as follows:
- Patterns : We find several interesting observations about evolution patterns of graphlets in temporal graphs: surprising similarity in graphs from the same domain and local-structural signals regarding the future importance of nodes and edges.
- Tool : We introduce graphlet transition graphs, which is an effective tool for measuring the similarity of local dynamics in temporal graphs of different sizes.
- Prediction : We enhance the accuracy of predicting the future importance of nodes and edges by introducing role-based local features, which are complementary to global features.
The preprocessed datasets and the centralities of nodes used in the paper are provided here. Please download the files from the above link, and put them under the "./data" folder so that the hierarchy would be like, for example,
data
|__centrality
|__askubuntu-degree.csv
|__askubuntu-between.csv
|__...
|__askubuntu.out
|__askubuntu-random.out
|__ ...
src
Original datasets used in the paper are listed as follows:
| Name | #Nodes | #Edges | Description | Download Original Data |
|---|---|---|---|---|
| cite-HepPh (hepph) | 34,565 | 346,849 | Citation | Link |
| cite-HepTh (hepth) | 18,477 | 136,190 | Citation | Link |
| cite-Patents (patent) | 3,774,362 | 16,512,782 | Citation | Link |
| email-Enron (enron) | 55,655 | 209,203 | Email/Message | Link |
| email-EU-core-temporal (email-eu) | 986 | 24,929 | Email/Message | Link |
| CollegeMsg (college_msg) | 1,899 | 20,296 | Email/Message | Link |
| sx-askubuntu (askubuntu) | 159,316 | 262,106 | Online Q/A | Link |
| sx-mathoverflow (mathoverflow) | 24,818 | 90,489 | Online Q/A | Link |
| sx-stackoverflow (stackoverflow) | 2,601,977 | 16,266,395 | Online Q/A | Link |
- Our code works on both Windows10 and Linux.
- JDK version : 15.0.1, Python version : 3.7.0.
- The input file should contain a set of temporal edges. Each temporal edge is represented by 1) the index of the source node, 2) the index of the destination node, and 3) its timestamp, written in a line.
- The node index starts from 0 and increases by 1 whenever a new node arrives (i.e., 0, 1, 2, ..., |V|-2, |V|-1).
- For example, for a set of 3 temporal edges (0 → 1, 2001-01-01), (0 → 2, 2001-01-01), and (1 → 3, 2001-01-02), the input file should be:
0 1 2001-01-01
0 2 2001-01-01
1 3 2001-01-02
You can create intermediate files and see results by running script files below. Because it takes a long time to run large datasets (Patent, Stackoverflow), they are commented out.
- draw-correlation-heatmap.sh : draw a heatmap which represents the similarity between graphs. (Figures 1 and 6)
- generate-all-evolution.sh : generate the distributions of ratios of graphlet instances over time among all datasets.
- graph-evolution.sh : draw ratios of instances of graphlets over time. (Table 3)
- graphlet-transition-graph.sh : draw graphlet transition graphs. (Table 5)
- node_signal.sh : draw the Spearman's rank correlation coefficient between node role ratios and future centralities. (Figure 4)
- node_prediction.sh : generate node features (node role, node prominence profile, and global statistics) and predict the centrality of nodes using them.
- edge_signal.sh : draw the Spearman's rank correlation coefficient between edge role ratios and future centralities. (Figure 5)
- edge_prediction.sh : generate edge features (edge role and global statistics) and predict the centrality of edges using them.