-
Notifications
You must be signed in to change notification settings - Fork 6
Expand file tree
/
Copy path2_pathpy.py
More file actions
246 lines (175 loc) · 13.6 KB
/
2_pathpy.py
File metadata and controls
246 lines (175 loc) · 13.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
#%%
import markdown
from IPython.core.display import display, HTML
def md(str):
display(HTML(markdown.markdown(str + "<br />")))
#%%
md("""
# 2 Introducing `pathpy`
**Ingo Scholtes**
Data Analytics Group
Department of Informatics (IfI)
University of Zurich
**September 5 2018**
In the introductory lecture we have seen that higher-order modelling, visualisation, and analysis techniques are useful to analyze **temporal network data** that provide us with **statistics of paths in complex networks**. But how can we apply higher-order network analytics to such such data in practice?
In this tutorial, I will introduce [``pathpy``](http://www.pathpy.net), an OpenSource `python` package that provides higher-order data analytics and representation learning techniques. It contains data structures, algorithms, data import/export methods, and visualisation techniques for various types of time series data on complex networks.
`pathpy` is pure `python` code with no platform-specific dependencies. It only depends on `numpy` and `scipy`, which come with `Anaconda`, so it should be very easy to install. In principle installing the latest 2.0 version of `pathpy` should be as easy as running
```
pip install pathpy2
```
on the terminal. In any case, you can find more detailed setup instructions on the [tutorial website](https://ingoscholtes.github.io/csh2018-tutorial/setup).
<span style="color:red">**TODO:** Import the package `pathpy` and rename it to `pp`</span>
""")
#%% In [1]
import pathpy as pp
#%%
md("""
A core functionality of `pathpy` is to read, calculate, store, manipulate, and model path statistics extracted from different kinds of temporal data on complex networks. For this `pathpy` provides the class `Paths`, which can store collections of paths with varying lengths. All classes and methods in `pathpy` are documented using `python`'s docstring feature so we can access the documentation using the standard `help` function.
<span style="color:red">**TODO:** Use the `help` function to obtain a description of the class `Paths`.</span>
""")
#%% In [2]
help(pp.Paths)
#%%
md("""
In Visual Studio Code, the documentation of classes, methods, and properties is automatically shown as a tooltip as you type. If you use the browser-based `jupyter notebook` editor, you can bring up the documentation by pressing `Shift`+`Tab` as you type. You can try this with the `Paths` class.
<span style="color:red">**TODO:** Create an empty `Paths` instance `toy_paths` by calling the constructor with no arguments.</span>
""")
#%% In [3]
toy_paths = pp.Paths()
#%%
md("""
We now have an empty `Paths` instance `toy_paths` that we can use to add path statistics to generate a small toy example. We can add paths using the method `add_path`. As the first parameter, it accepts any iterable (list, string, etc.) of `string` variables (or objects that can be cast to `string)`, where each entry in the iterable is one step (i.e. node) on a path. The optional `frequency` parameter captures the number of times a specific path has been observed.
<span style="color:red">**TODO:** Add 10 observations of a path $a \rightarrow c \rightarrow e$ between three nodes $a$, $c$, and $e$ to the `toy_paths` instance.</span>
""")
#%% In [4]
toy_paths.add_path(('a', 'c', 'd'), frequency=10)
#%%
md("""
Each class in `pathpy` provides a properly formatted string representation, which can be shown by invoking `print` on an instance.
<span style="color:red">**TODO**: Print a string summary of the instance `toy_paths`</span>
""")
#%% In [5]
print(toy_paths)
#%%
md("""
We get summary statistics of the `Paths` instance. Our toy example contains 10 observed paths between three nodes. These paths imply a graph topology with two edges `(a,b)` and `(b,c)`. Both the maximum and the average path length is two (the path length counts the number of edge traversals of a path).
To understand the last three lines and the second line in the output, we must look into the inner workings of `pathpy`. For the fitting of higher-order graphical models as well as for the representation learning algorithm, `pathpy` uses all path statistics available. Specifically to fit, say, a second-order model to a set of paths that all have length 10 or longer, we calculate which paths of length two are contained as sub-paths within these observations of longer paths. For this reason, `pathpy` automatically calculates the statistics of actual path observations as well as the statistics of **sub-paths** contained in these observed paths.
In our case, we have $10$ observations of a single path $a \rightarrow b \rightarrow c$ of length two, thus the last line in the output above. Each of these paths additionally contains two sub-paths $a \rightarrow b$ and $b \rightarrow c$ of length two, thus the number $20.0$ in the sub-path count in the line $k=1$. Finally, each of the paths contains three "paths" of length zero, which are just observations of a single node (i.e. there is no transition across an edge), thus the sub-path count of $30.0$ in the line $k=0$. This amounts to a total of $50.0$ sub-paths + $10$ observations of an actual (longest) path, thus explaining the second line in the output.
Apart from adding paths as a tuple, we can also add them as string-encoded n-grams, using the parameter `separator` to specify a character that separates nodes.
<span style="color:red">**TODO:** Create a new `Paths` instance `ngram_paths`, add 10 path observations using the ngram `"b-c-e"`, and print a summary of the resulting instance.</span>
""")
#%% In [6]
ngram_paths = pp.Paths()
ngram_paths.add_path('b-c-e', separator='-', frequency=10)
print(ngram_paths)
#%%
md("""
We obtain a `Paths` object with 10 observations of path $b\rightarrow c \rightarrow$. We can add this to our previous `toy_paths` instance by using arithmetic operators on instances of the `Paths` class.
<span style="color:red">**TODO:** Use arithmetic operators to add `toy_paths` and `ngram_paths` and print a summary of the result.</span>
""")
#%% In [7]
toy_paths += ngram_paths
print(toy_paths)
#%%
md("""
We obtain a new `Paths` instance with $20$ observed paths between five nodes $a$, $b$, $c$, $d$, and $e$ across four edges $(a,c)$, $(c,d)$, $(b,c)$ and $(c,e)$. Let us first use the function `Paths.write_file` to save these paths for later use.
<span style="color:red">**TODO:** Save the paths to an ngram file `data/toy_paths.ngram`.</span>
""")
#%% In [8]
toy_paths.write_file('data/toy_paths.ngram')
#%%
md("""
We often analyse or visualise graph or network topologies in which the observed paths have occurred. For this, `pathpy` provides the class `Network`, which you can use to read, manipulate, analyse, and visualise directed, undirected, weighted, and unweighted networks.
We can easily turn any `Paths` instance into a network by using the class method `Network.from_paths`. This will cut each path $v_0 \rightarrow v_1 \rightarrow v_2 \rightarrow \ldots$ into multiple directed *dyadic relations* $(v_i, v_{i+1})$ that are represented by directed edges.
<span style="color:red">**TODO**: Create a `Network` instance `toy_graph` from the `toy_paths` instance and print a summary of the network.</span>
""")
#%% In [9]
toy_graph = pp.Network.from_paths(toy_paths)
print(toy_graph)
#%%
md("""
We obtain a network with five nodes $a$, $b$, $c$, $d$, and $e$ and four directed edges. The number of times each edge is traversed by a path is captured by the **weights** of these edges. We can access these weights using the dictionary `toy_graph.edges[edge]`.
<span style="color:red">**TODO:** Print the `weight` of edge `(a, c)` in `toy_graph`.</span>
""")
#%% In [10]
print('Weight of edge (a,c) is {0}'.format(toy_graph.edges[('a', 'c')]['weight']))
#%%
md("""
In fact, since each edge can be viewed as a path of length one, the edge weights in our example are nothing more than the statistics of corresponding **sub-paths** of length one in the `toy_paths`. We can check the statistics of paths in a `Paths` instance by accessing `p.paths[length][path_tuple]`. This will return an array of two numbers, where the first entry is the number of times a path has been observed as **sub-path** of length l, and the second number gives the number of times a path has been observed as an **actual** (longest) path of length l.
<span style="color:red">**TODO:** Verify that the sub-path frequency of path `(a,c)` in `toy_paths` coincides with the weight of edge `(a,c)` in `toy_graph`.</span>
""")
#%% In [11]
print('Frequency of path (a,c) as subpath of length one is {0}'.format(toy_paths.paths[1][('a', 'c')][0]))
#%%
md("""
When dealing with rich, real-world data we often want to store additional attributes of nodes and edges. `pathpy`'s `Network` class accomodates for such settings by supporting arbitrary attributes both at the level of nodes and edges. Moreover, we can find nodes or edges with certain attributes by passing `lambda` functions to the `find_nodes` or `find_edges` method.
The attribute `x` of a node `v` or an edge `e` in a network instance `net` can be accessed simply by reading or writing to the dictionary net.nodes[v][x] and net.edges[e][x] respectively.
<span style="color:red">**TODO:** Set the node and edge attributes of some nodes and edges in `toy_graph`, and use `find_nodes` and `find_edges` to identify nodes and edges that have certain attributes.</span>
""")
#%% In [12]
# we can assign arbitrary additional attribute values to nodes and edges
toy_graph.edges[('a', 'c')]['edge_property'] = 'some value'
toy_graph.edges[('c', 'd')]['edge_property'] = 'some other value'
toy_graph.nodes['a']['node_property'] = 42.0
toy_graph.nodes['b']['node_property'] = 0.0
toy_graph.edges[('a', 'c')]['label'] = 'my_label'
# return edges with a given attribute value
print('Edges that satisfy edge_propery == "some value": ' + \
str(toy_graph.find_edges(select_edges = lambda e: 'edge_property' in e and e['edge_property'] == 'some value')))
# return nodes with a given attribute value
print('Nodes that satisfy node_property > 40.0: ' + \
str(toy_graph.find_nodes(select_node = lambda v: 'node_property' in v and v['node_property'] > 40.0)))
#%%
md("""
Finally, `pathpy` offers rich, interactive HTML5 visualisations of all of its internal objects that can also be shown inline in a `jupyter` notebook. To embed a visualisation of our example network in our notebook, we simply have to write the name of the variable as the last line of a `jupyter` cell.
Note that you can interact with the generated graph using the mouse. We can drag nodes as well as pan and zoom. Hovering above a node will show the name of that node.
<span style="color:red">**TODO**: Visualise the network `toy_graph`, drag a node, hover above a node, and pan and zoom.</span>
""")
#%% In [13]
toy_graph
#%%
md("""
Note the `[save svg]` label in the top-part of the output cell. Clicking this label will download the current view of the visualisation as a publication-grade scalable vector graphics file. This file is a good basis for further fine-tuning in SVG-compliant tools like Inkscape or Adobe Illustrator.
<span style="color:red">**TODO**: Export the file above as SVG by clicking the label.</span>
We can also programmatically style our network by using the generic `plot` function in the module `pathpy.visualisation`. This function allows us to pass parameters that will influence the visualisation.
<span style="color:red">**TODO**: Check the documentation of `pp.visualisation.plot` and change the color of nodes to red.</span>
""")
#%% In [14]
pp.visualisation.plot(toy_graph, node_color='red')
#%%
md("""
We often want to reuse the same visualisation parameters in multiple plots. For this, it is convenient to store our parameters in a dictionary and then pass all of them at once.
Let us explore some of the features supported by `pathpy`'s default visualisation templates. In the second session we will learn more about **custom visualisation templates** that can be defined by the user, and which can take arbitrary visualisation parameters.
<span style="color:red">**TODO**: Use the `help` function to show which parameters are supported by the function `pp.visualisation.plot`.</span>
""")
#%% In [15]
help(pp.visualisation.plot)
#%%
md("""
<span style="color:red">**TODO**: Create a parameter dictionary `style` that changes the plot size, switches off edge arrows, assigns individual colors to nodes, changes label position, color and font size, and adjust node size and edge width.</span>
""")
#%% In [16]
style = {'width': 300,
'height': 300,
'node_size': 18.0,
'edge_width' : 4.0,
'node_color' : {'a': '#aacc99', 'b': '#aacc99', 'd': '#aacc99', 'e': '#aacc99', 'c': '#cc6666'},
'edge_color' : '#ffaaaa',
'edge_arrows': False,
'label_color': '#000000',
'label_opacity': 1,
'label_offset': [0,0],
'label_size': '20px',
'edge_opacity': 1,
'force_charge': -10.0,
'force_repel': -550,
'force_alpha': 0.01
}
pp.visualisation.plot(toy_graph, **style)
#%%
md("""
Once we are satisfied with our visualisation, we can use the method `pp.visualisation.export_html` to save it as a stand-alone HTML file. This file will run in any HTML5 browser and we can share or publish as an interactive visualisation on the web.
<span style="color:red">**TODO**: Save your visualisation to a file `test_network.html`. Reuse the visualisation parameters from above.</span>
""")
#%% In [17]
pp.visualisation.export_html(toy_graph, filename='visualisations/2_simple_network.html', **style)