Struct 2 Vec
Struct 2 Vec
Identity
Leonardo F. R. Ribeiro Pedro H. P. Saverese Daniel R. Figueiredo
Federal University of Rio de Janeiro Federal University of Rio de Janeiro Federal University of Rio de Janeiro
Systems Eng. and Comp. Science Dep. Systems Eng. and Comp. Science Dep. Systems Eng. and Comp. Science Dep.
leo@[Link] savarese@[Link] daniel@[Link]
ABSTRACT c y
Structural identity is a concept of symmetry in which network b d x t
nodes are identified according to the network structure and their network v
relationship to other nodes. Structural identity has been studied u
w
arXiv:1704.03165v3 [[Link]] 3 Jul 2017
in theory and practice over the past decades, but only recently a e z
has it been addressed with representational learning techniques.
This work presents struc2vec, a novel and flexible framework for
learning latent representations for the structural identity of nodes. Figure 1: An example of two nodes (u and v) that are struc-
struc2vec uses a hierarchy to measure node similarity at differ- turally similar (degrees 5 and 4, connected to 3 and 2 trian-
ent scales, and constructs a multilayer graph to encode structural gles, connected to the rest of the network by two nodes), but
similarities and generate structural context for nodes. Numerical very far apart in the network.
experiments indicate that state-of-the-art techniques for learning
node representations fail in capturing stronger notions of structural emerges when node function is defined solely by the network struc-
identity, while struc2vec exhibits much superior performance in ture. In this context, not even the labels of the nodes matter but
this task, as it overcomes limitations of prior approaches. As a con- just their relationship to other nodes (edges). Indeed, mathematical
sequence, numerical experiments indicate that struc2vec improves sociologists have worked on this problem since the 1970s, defin-
performance on classification tasks that depend more on structural ing and computing structural identity of individuals in social net-
identity. works [11, 17, 19]. Beyond sociology, the role of webpages in the
webgraph is another example of identity (in this case, hubs and
CCS CONCEPTS authorities) emerging from the network structure, as defined by
•Computing methodologies → Unsupervised learning; Learn- the celebrated work of Kleinberg [8].
ing latent representations; •Artificial Intelligence → Learning; The most common practical approaches to determine the struc-
tural identity of nodes are based on distances or recursions. In
KEYWORDS the former, a distance function that leverages the neighborhood of
feature learning; node embeddings; structural identity the nodes is used to measure the distance between all node pairs,
with clustering or matching then performed to place nodes into
equivalent classes [5, 9]. In the later, a recursion with respect to
1 INTRODUCTION neighboring nodes is constructed and then iteratively unfolded
In almost all networks, nodes tend to have one or more functions until convergence, with final values used to determine the equiva-
that greatly determine their role in the system. For example, individ- lent classes [3, 8, 26]. While such approaches have advantages and
uals in a social network have a social role or social position [11, 19], disadvantages, we provide an alternative methodology, one based
while proteins in a protein-protein interaction (PPI) network exert on unsupervised learning of representations for the structural iden-
specific functions [1, 22]. Intuitively, different nodes in such net- tity of nodes (to be presented). Recent efforts in learning latent
works may perform similar functions, such as interns in the social representations for nodes in networks have been quite successful
network of a corporation or catalysts in the PPI network of a cell. in performing classification and prediction tasks [6, 14, 16, 23]. In
Thus, nodes can often be partitioned into equivalent classes with particular, these efforts encode nodes using as context a generalized
respect to their function in the network. notion of their neighborhood (e.g., w steps of a random walk, or
Although identification of such functions often leverage node nodes with neighbors in common). In a nutshell, nodes that have
and edge attributes, a more challenging and interesting scenario neighborhoods with similar sets of nodes should have similar latent
Permission to make digital or hard copies of all or part of this work for personal or representations. But neighborhood is a local concept defined by
classroom use is granted without fee provided that copies are not made or distributed some notion of proximity in the network. Thus, two nodes with
for profit or commercial advantage and that copies bear this notice and the full citation neighborhoods that are structurally similar but that are far apart
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, will not have similar latent representations. Figure 1 illustrates the
to post on servers or to redistribute to lists, requires prior specific permission and/or a problem, where nodes u and v play similar roles (i.e., have similar
fee. Request permissions from permissions@[Link].
local structures) but are very far apart in the network. Since their
KDD ’17, August 13-17, 2017, Halifax, NS, Canada
© 2017 ACM. 978-1-4503-4887-4/17/08. . . $15.00 neighborhoods have no common nodes, recent approaches cannot
DOI: 10.1145/3097983.3098061 capture their structural similarity (as we soon show).
It is worth noting why recent approaches for learning node The remainder of this paper is organized as follows. Section 2
representations such as DeepWalk [16] and node2vec [6] succeed in briefly overviews the recent related work on learning latent repre-
classification tasks but tend to fail in structural equivalence tasks. sentations of nodes in networks. Section 3 presents the struc2vec
The key point is that many node features in most real networks framework in detail. Experimental evaluation and comparison to
exhibit a strong homophily (e.g., two blogs with the same political other methods are shown in Section 4. Finally, Section 5 concludes
inclination are much more likely to be connected than at random). the paper with a brief discussion.
Neighbors of nodes with a given feature are more likely to have the
same feature. Thus, nodes that are “close” in the network and in
the latent representation will tend to share features. Likewise, two
nodes that are “far” in the network will tend to be separated in the 2 RELATED WORK
latent representation, independent of their local structure. Thus, Embedding network nodes in (Euclidean) space has received much
structural equivalence will not properly be captured in the latent attention over the past decades from different communities. The
representation. However, if classification is performed on features technique is instrumental for Machine Learning applications that
that depend more on structural identity and less on homophily, leverage network data, as node embeddings can be directly used in
then such recent approaches are likely to be outperformed by latent tasks such as classification and clustering.
representations that better capture structural equivalence (as we In Natural Language Processing [2], generating dense embed-
soon show). dings for sparse data has a long history. Recently, Skip-Gram [12,
Our main contribution is a flexible framework for learning latent 13] was proposed as an efficient technique to learn embeddings
representations for the structural identity of nodes, called struc2vec. for text data (e.g., sentences). Among other properties, the learned
It is an alternative and powerful tool to the study of structural language model places semantically similar words near each other
identity through latent representations. The key ideas of struc2vec in space.
are: Learning a language model from a network was first proposed
by DeepWalk [16]. It uses random walks to generate sequences
• Assess structural similarity between nodes independently
of nodes from the network, which are then treated as sentences
of node and edge attributes as well as their position in
by Skip-Gram. Intuitively, nodes close in the network will tend to
the network. Thus, two nodes that have a similar local
have similar contexts (sequences) and thus have embeddings that
structure will be considered so, independent of network
are near one another. This idea was later extended by node2vec [6].
position and node labels in their neighborhoods. Our ap-
By proposing a biased second order random walk model, node2vec
proach also does not require the network to be connected,
provides more flexibility when generating the context of a vertex.
and identifies structurally similar nodes in different con-
In particular, the edge weights driving the biased random walks
nected components.
can be designed in an attempt to capture both vertex homophily
• Establish a hierarchy to measure structural similarity, al-
and structural equivalence. However, a fundamental limitation is
lowing progressively more stringent notions of what it
that structurally similar nodes will never share the same context if
means to be structurally similar. In particular, at the bot-
their distance (hop count) is larger than the Skip-Gram window.
tom of the hierarchy, structural similarity between nodes
subgraph2vec [14] is another recent approach for learning em-
depend only on their degrees, while at the top of the hier-
beddings for rooted subgraphs, and unlike the previous techniques
archy similarity depends on the entire network (from the
it does not use random walks to generate context. Alternatively, the
viewpoint of the node).
context of a node is simply defined by its neighbors. Additionally,
• Generates random contexts for nodes, which are sequences
subgraph2vec captures structural equivalence by embedding nodes
of structurally similar nodes as observed by a weighted
with the same local structure to the same point in space. Nonethe-
random walk traversing a multilayer graph (and not the
less, the notion of structural equivalence is very rigid since it is
original network). Thus, two nodes that frequently appear
defined as a binary property dictated by the Weisfeiler-Lehman
with similar contexts will likely have similar structure.
isomorphism test [21]. Thus, two nodes that are structurally very
Such context can be leveraged by language models to learn
similar (but fail the test) and have non-overlapping neighbors may
latent representation for the nodes.
not be close in space.
We implement an instance of struc2vec and show its potential Similarly to subgraph2vec, considerable effort has recently been
through numerical experiments on toy examples and real networks, made on learning richer representations for network nodes [4, 24].
comparing its performance with DeepWalk [16] and node2vec [6] – However, building representations that explicitly capture structural
two state-of-the-art techniques for learning latent representations identity is a relative orthogonal problem that has not received much
for nodes, and with RolX [7] – a recent approach to identify roles attention. This is the focus of struc2vec.
of nodes. Our results indicate that while DeepWalk and node2vec A recent approach to explicitly identify the role of nodes using
fail to capture the notion of structural identity, struc2vec excels on just the network structure is RolX [7]. This unsupervised approach
this task – even when the original network is subject to strong ran- is based on enumerating various structural features for nodes, find-
dom noise (random edge removal). We also show that struc2vec is ing the more suited basis vector for this joint feature space, and
superior in a classification task where node labels depends more on then assigning for every node a distribution over the identified roles
structural identity (i.e., air-traffic networks with labels representing (basis), allowing for mixed membership across the roles. Without
airport activity). explicitly considering node similarity or node context (in terms of
structure), RolX is likely to miss node pairs that are structurally denotes the number of nodes in the network and k ∗ its diameter.
equivalent (to be shown). Let Rk (u) denote the set of nodes at distance (hop count) exactly
k ≥ 0 from u in G. Note that R 1 (u) denotes the set of neighbors of
3 STRUC2VEC u and in general, Rk (u) denotes the ring of nodes at distance k. Let
Consider the problem of learning representations that capture the s(S) denote the ordered degree sequence of a set S ⊂ V of nodes.
structural identity of nodes in the network. A successful approach By comparing the ordered degree sequences of the rings at dis-
should exhibit two desired properties: tance k from both u and v we can impose a hierarchy to measure
• The distance between the latent representation of nodes structural similarity. In particular, let fk (u, v) denote the structural
should be strongly correlated to their structural similarity. distance between u and v when considering their k-hop neighbor-
Thus, two nodes that have identical local network struc- hoods (all nodes at distance less than or equal to k and all edges
tures should have the same latent representation, while among them). In particular, we define:
nodes with different structural identities should be far fk (u, v) = fk −1 (u, v) + д(s(Rk (u)), s(Rk (v))),
apart. (1)
k ≥ 0 and |Rk (u)|, |Rk (v)| > 0
• The latent representation should not depend on any node or
where д(D 1 , D 2 ) ≥ 0 measures the distance between the ordered
edge attribute, including the node labels. Thus, structurally
degree sequences D 1 and D 2 and f −1 = 0. Note that by definition
similar nodes should have close latent representation, inde-
fk (u, v) is non-decreasing in k and is defined only when both u or
pendent of node and edge attributes in their neighborhood.
v have nodes at distance k. Moreover, using the ring at distance
The structural identity of nodes must be independent of its
k in the definition of fk (u, v) forces the comparison between the
“position” in the network.
degree sequences of nodes that are at the same distance from u
Given these two properties, we propose struct2vec, a general frame- and v. Finally, note that if the k-hop neighborhoods of u and v are
work for learning latent representations for nodes composed of isomorphic, and maps u onto v, then fk −1 (u, v) = 0.
four main steps, as follows: A final step is determining the function that compares two degree
(1) Determine the structural similarity between each vertex sequences. Note that s(Rk (u)) and s(Rk (v)) can be of different sizes
pair in the graph for different neighborhood sizes. This and its elements are arbitrary integers in the range [0, n − 1] with
induces a hierarchy in the measure for structural similar- possible repetitions. We adopt Dynamic Time Warping (DTW) to
ity between nodes, providing more information to assess measure the distance between two ordered degree sequences, a
structural similarity at each level of the hierarchy. technique that can cope better with sequences of different sizes and
(2) Construct a weighted multilayer graph where all nodes loosely compares sequence patterns [18, 20].
in the network are present in every layer, and each layer Informally, DTW finds the optimal alignment between two se-
corresponds to a level of the hierarchy in measuring struc- quences A and B. Given a distance function d(a, b) for the elements
tural similarity. Moreover, edge weights among every node of the sequence, DTW matches each element a ∈ A to b ∈ B, such
pair within each layer are inversely proportional to their that the sum of the distances between matched elements is mini-
structural similarity. mized. Since elements of sequence A and B are degrees of nodes,
(3) Use the multilayer graph to generate context for each node. we adopt the following distance function:
In particular, a biased random walk on the multilayer graph max(a, b)
is used to generate node sequences. These sequences are d(a, b) = −1 (2)
min(a, b)
likely to include nodes that are more structurally similar.
(4) Apply a technique to learn latent representation from a Note that when a = b then d(a, b) = 0. Thus, two identical ordered
context given by the sequence of nodes, for example, Skip- degree sequences will have zero distance. Also note that by taking
Gram. the ratio between the maximum and the minimum, the degrees
1 and 2 are much more different than degrees 101 and 102, a de-
Note that struct2vec is quite flexible as it does not mandates any
sired property when measuring the distance between node degrees.
particular structural similarity measure or representational learning
Finally, while we use DTW to assess the similarity between two
framework. In what follows, we explain in detail each step of
ordered degree sequences, any other cost function could be adopted
struct2vec and provide a rigorous approach to a hierarchical measure
by our framework.
of structural similarity.
makes it impossible for nodes in K 1 and K 2 to appear in the same of structural equivalence when assigning roles to nodes, struct2vec
context. better identifies and separates structural equivalence.
struct2vec, on the other hand, learns representations that prop-
erly separate the equivalent classes, placing structurally equivalent
nodes near one another in the latent space. Note that nodes of the
same color are tightly grouped together. Moreover, p1 and p10 are 4.2 Karate network
placed close to representations for nodes in K 1 and K 2 , as they are The Zachary’s Karate Club [25] is a network composed of 34 nodes
the bridges. Finally, note that none of the three optimizations have and 78 edges, where each node represents a club member and
any significant effect on the quality of the representations. In fact, edges denote if two members have interacted outside the club. In
structurally equivalent nodes are even closer to one another in the this network, edges are commonly interpreted as indications of
latent representations under OPT1. friendship between members.
Last, we apply RolX to the barbell graph (results in Figure 2(b)). We construct a network composed of two copies G 1 and G 2 of
A total of six roles were identified and some roles indeed precisely the Karate Club network, where each node v ∈ V (G 1 ) has a mirror
captured structural equivalence (roles 1 and 3). However, struc- node u ∈ V (G 2 ). We also connect the two networks by adding
turally equivalent nodes (in K 1 and K 2 ) were placed in three dif- an edge between mirrored node pairs 1 and 37. Although this is
ferent roles (role 0, 2, and 5) while role 4 contains all remaining not necessary for our framework, DeepWalk and node2vec cannot
nodes in the path. Thus, although RolX does capture some notion place in the same context nodes in different connected components
of the graph. Thus, we add the edge for a more fair comparison
Figure 4: (a) Mirrored Karate network. Identical colors cor-
respond to mirrored nodes. (b) Roles identified by RolX