0% found this document useful (0 votes)

11 views214 pages

Community Detection

Uploaded by

Mateo W Racca

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views214 pages

Community Detection

Uploaded by

Mateo W Racca

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Unraveling the community

structure of networks
David Soriano Paños
So far in the course Microscopic indicators
So far in the course Microscopic indicators

- Degree
So far in the course Microscopic indicators

- Degree
- Clustering
So far in the course Microscopic indicators

- Degree
- Clustering
- Betweenness
…
So far in the course Microscopic indicators

- Degree
- Clustering
- Betweenness
…

Models
So far in the course Microscopic indicators

- Degree
- Clustering
- Betweenness
…

Models

- ER model
So far in the course Microscopic indicators

- Degree
- Clustering
- Betweenness
…

Models

- ER model
- BA model
So far in the course Microscopic indicators

- Degree
- Clustering
- Betweenness
…

Models

- ER model
- BA model
- Con guration model
fi
So far in the course Microscopic indicators

- Degree
- Clustering
- Betweenness
…

Models

- ER model
- BA model
- Con guration model
- WS model
fi
Scales in networks Microscopic indicators

- Degree
- Clustering
- Betweenness
…

Models

- ER model
- BA model
- Con guration model
- WS model
fi
Scales in networks Microscopic indicators

- Degree
- Clustering
- Betweenness
…

Models

- ER model
- BA model
- Con guration model
- WS model
fi
Scales in networks Microscopic indicators

- Degree
- Clustering
- Betweenness
…

Models

- ER model
- BA model
- Con guration model
- WS model

Community structure
fi
Real networks have modular structure

D. Marbach et al. Nat. Methods 9, 8 (2012)

Nodes: Genes

Links: Regulation of transcription factors

Communities: Genes responsible for similar functions

Real networks have modular structure

X. Barandian et al. Front. Psychol. 11:1549 (2020)

Nodes: Twitter accounts

Links: Retweets during Spanish 2019 elections

Communities: Political parties

Real networks have modular structure
owner

instructor

W. W. Zachary, J. Anthropol. Res. 33, 452 (1977).

Nodes: Members of a Karate Club Credits: Clara Granell

Links: Interactions outside the club

Communities: The division in two new clubs after a ght between the
owner and the instructor
fi
No clear de nition of what’s a community
Schaub et al. Applied Network Science 2:4 (2017)

Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).

Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them

Stochastically equivalent nodes: Partitions of nodes sharing similar statistical properties

regarding their structure of connections.

Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
fi
No clear de nition of what’s a community
Schaub et al. Applied Network Science 2:4 (2017)

Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).

Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them

Stochastically equivalent nodes: Partitions of nodes sharing similar statistical properties

regarding their structure of connections.

Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).

Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them

Stochastically equivalent nodes: Partitions of nodes sharing similar statistical properties

regarding their structure of connections.

Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).

Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them

Stochastically equivalent nodes: Partitions of nodes sharing similar statistical properties

regarding their structure of connections.

Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset

2
1
4
3

5
6
7
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset

2
1
4
3

5
6
7
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset

2 2
1 1
4 4
3 3

5 5
6 6
7 7
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset

2 2 2
1 1 1
4 4 4
3 3 3

5 5 5
6 6 6
7 7 7
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset

2 2 2
1 1 1
4 4 4
3 3 3

5 5 5
6 6 6
7 7 7 …
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset

2 2 2
1 1 1
4 4 4
3 3 3

5 5 5
6 6 6
7 7 7 …
Number of all the possible partitions of a network of n nodes in k subsets grows faster
than exponentially according to the Bell number
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset

2 2 2
1 1 1
4 4 4
3 3 3

5 5 5
6 6 6
7 7 7 …
Number of all the possible partitions of a network of n nodes in k subsets grows faster
than exponentially according to the Bell number

1 ∞ kn
e∑
Bn =
k=0
k!
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset

2 2 2
1 1 1
4 4 4
3 3 3

5 5 5
6 6 6
7 7 7 …
Number of all the possible partitions of a network of n nodes in k subsets grows faster
than exponentially according to the Bell number

1 ∞ kn {B} = 1,2,5,15,52,203,877…

e∑
Bn =
k=0
k!
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset

2 2 2
1 1 1
4 4 4
3 3 3

5 5 5
6 6 6
7 7 7 …
Number of all the possible partitions of a network of n nodes in k subsets grows faster
than exponentially according to the Bell number

1 ∞ kn {B} = 1,2,5,15,52,203,877…

e∑
Bn =
k=0
k! 877 con gurations for a
tiny network of 7 nodes!
fi
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset

2 2 2
1 1 1
4 4 4
3 3 3

5 5 5
6 6 6
7 7 7 …
Number of all the possible partitions of a network of n nodes in k subsets grows faster
than exponentially according to the Bell number

1 ∞ kn {B} = 1,2,5,15,52,203,877… We need e cient algorithms to

e∑
Bn = uncover the modular structure of
k=0
k! 877 con gurations for a
tiny network of 7 nodes! networks
fi
ffi
fi
Community detection: An open challenge
Schaub et al. Applied Network Science 2:4 (2017)

Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).

Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them

Stochastically equivalent nodes: Partitions of nodes sharing similar statistical properties

regarding their structure of connections.

Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
Cut-based perspective

B.W. Kernigham et al. Bell Syst. Tech. 49, 291–307. (1970)

Cut-based perspective

Outcome: Roughly equally sized groups

with minimal number of connections
established among them. Note that this is
independent from the density of internal
connections

B.W. Kernigham et al. Bell Syst. Tech. 49, 291–307. (1970)

Cut-based perspective

Outcome: Roughly equally sized groups

with minimal number of connections
established among them. Note that this is
independent from the density of internal
connections

Problems:
De nition of the number of partitions
(Otherwise a single partition)

B.W. Kernigham et al. Bell Syst. Tech. 49, 291–307. (1970)

fi
Cut-based perspective

Outcome: Roughly equally sized groups

with minimal number of connections
established among them. Note that this is
independent from the density of internal
connections

Outcome: Roughly equally sized groups

with minimal number of connections
established among them. Note that this is
independent from the density of internal
connections

Outcome: Roughly equally sized groups

with minimal number of connections
established among them. Note that this is
independent from the density of internal
connections

Problems:
De nition of the number of partitions
(Otherwise a single partition)
De nition of the minimum size of the
partitions
B.W. Kernigham et al. Bell Syst. Tech. 49, 291–307. (1970)
(Otherwise single-node partitions)
Applications:
Circuit layout and design
fi
fi
Cut-based perspective

Outcome: Roughly equally sized groups

with minimal number of connections
established among them. Note that this is
independent from the density of internal
connections

Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).

Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them

Stochastically equivalent nodes: Partitions of nodes sharing similar statistical properties

regarding their structure of connections.

Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
Clustering-perspective: Hierarchical clustering
Clustering-perspective: Hierarchical clustering
Approach: Group nodes according to their similarity in terms of network attributes
Clustering-perspective: Hierarchical clustering
Approach: Group nodes according to their similarity in terms of network attributes

Cosine similarity/Jaccard index between the set of neighbors of two nodes

Clustering-perspective: Hierarchical clustering
Approach: Group nodes according to their similarity in terms of network attributes

Cosine similarity/Jaccard index between the set of neighbors of two nodes

Paths connecting two nodes
Clustering-perspective: Hierarchical clustering
Approach: Group nodes according to their similarity in terms of network attributes

Cosine similarity/Jaccard index between the set of neighbors of two nodes

Paths connecting two nodes

Agglomerative approaches Divisive approaches

Clustering-perspective: Hierarchical clustering
Approach: Group nodes according to their similarity in terms of network attributes

Cosine similarity/Jaccard index between the set of neighbors of two nodes

Paths connecting two nodes

Agglomerative approaches Divisive approaches

Bottom-up approach
Start from N communities of a single node
Group closest nodes sequentially
Clustering-perspective: Hierarchical clustering
Approach: Group nodes according to their similarity in terms of network attributes

Cosine similarity/Jaccard index between the set of neighbors of two nodes

Paths connecting two nodes

Agglomerative approaches Divisive approaches

Bottom-up approach Top-down approach

Start from N communities of a single node Start from a community with N individuals
Group closest nodes sequentially Separate least similar nodes
Clustering-perspective: Hierarchical clustering

Agglomerative approaches Divisive approaches

Bottom-up approach Top-down approach

Start from N communities of a single node Start from a community with N individuals
Group closest nodes sequentially Separate least similar nodes

Many algorithms as a function of how we quantify distance between individual nodes and
how we measure distance to a cluster of nodes
Clustering-perspective: Hierarchical clustering

Agglomerative approaches Divisive approaches

Bottom-up approach Top-down approach

Start from N communities of a single node Start from a community with N individuals
Group closest nodes sequentially Separate least similar nodes

Many algorithms as a function of how we quantify distance between individual nodes and
how we measure distance to a cluster of nodes

Outcome:

Source: [Link]
Clustering-perspective: Hierarchical clustering

Agglomerative approaches Divisive approaches

Bottom-up approach Top-down approach

Start from N communities of a single node Start from a community with N individuals
Group closest nodes sequentially Separate least similar nodes

Many algorithms as a function of how we quantify distance between individual nodes and
how we measure distance to a cluster of nodes

Outcome: Problem:

Poorly connected nodes misclassi ed

despite clear group membership
fi
Clustering-perspective: Hierarchical clustering

Agglomerative approaches Divisive approaches

Bottom-up approach Top-down approach

Start from N communities of a single node Start from a community with N individuals
Group closest nodes sequentially Separate least similar nodes

Many algorithms as a function of how we quantify distance between individual nodes and
how we measure distance to a cluster of nodes

Outcome: Problem:

Poorly connected nodes misclassi ed

despite clear group membership

Source: [Link]
fi
Clustering-perspective: Girvan-Newman algorithm
Clustering-perspective: Girvan-Newman algorithm
Divisive algorithm removing connections based on edge betweeness
Clustering-perspective: Girvan-Newman algorithm
Divisive algorithm removing connections based on edge betweeness

M. Girvan and MEJ Newman PNAS 99 (12), 7821-7826 (2002)

Clustering-perspective: Girvan-Newman algorithm
Divisive algorithm removing connections based on edge betweeness

Edges with higher betweenness are more likely to split the network in di erent components

M. Girvan and MEJ Newman PNAS 99 (12), 7821-7826 (2002) ff

Clustering-perspective: Girvan-Newman algorithm
Divisive algorithm removing connections based on edge betweeness

Edges with higher betweenness are more likely to split the network in di erent components

The disruption of the network gives rise to a hierarchies of communities

M. Girvan and MEJ Newman PNAS 99 (12), 7821-7826 (2002) ff

Clustering-perspective: Girvan-Newman algorithm
Divisive algorithm removing connections based on edge betweeness

Edges with higher betweenness are more likely to split the network in di erent components

The disruption of the network gives rise to a hierarchies of communities

Girvan-Newman algorithm

M. Girvan and MEJ Newman PNAS 99 (12), 7821-7826 (2002) ff

Clustering-perspective: Girvan-Newman algorithm
Divisive algorithm removing connections based on edge betweeness

Edges with higher betweenness are more likely to split the network in di erent components

The disruption of the network gives rise to a hierarchies of communities

Girvan-Newman algorithm

1. Compute the edge betweeness for all

the connections

M. Girvan and MEJ Newman PNAS 99 (12), 7821-7826 (2002) ff

Clustering-perspective: Girvan-Newman algorithm
Divisive algorithm removing connections based on edge betweeness

Edges with higher betweenness are more likely to split the network in di erent components

The disruption of the network gives rise to a hierarchies of communities

Girvan-Newman algorithm

1. Compute the edge betweeness for all

the connections
2. Remove the connection with highest
edge betweenness

M. Girvan and MEJ Newman PNAS 99 (12), 7821-7826 (2002) ff

Clustering-perspective: Girvan-Newman algorithm
Divisive algorithm removing connections based on edge betweeness

Edges with higher betweenness are more likely to split the network in di erent components

The disruption of the network gives rise to a hierarchies of communities

Girvan-Newman algorithm

1. Compute the edge betweeness for all

the connections
2. Remove the connection with highest
edge betweenness

3. Recompute edge betweeness after

edge removal

M. Girvan and MEJ Newman PNAS 99 (12), 7821-7826 (2002) ff

Clustering-perspective: Girvan-Newman algorithm
Divisive algorithm removing connections based on edge betweeness

Edges with higher betweenness are more likely to split the network in di erent components

The disruption of the network gives rise to a hierarchies of communities

Girvan-Newman algorithm

1. Compute the edge betweeness for all

the connections
2. Remove the connection with highest
edge betweenness

3. Recompute edge betweeness after

edge removal

4. Repeat from step 2 until no more

edges can be removed

M. Girvan and MEJ Newman PNAS 99 (12), 7821-7826 (2002) ff

Clustering-perspective: Girvan-Newman algorithm
Divisive algorithm removing connections based on edge betweeness

Edges with higher betweenness are more likely to split the network in di erent components

The disruption of the network gives rise to a hierarchies of communities

Girvan-Newman algorithm

1. Compute the edge betweeness for all

the connections
2. Remove the connection with lowest
edge betweenness

3. Recompute edge betweeness after

edge removal

4. Repeat from step 2 until no more

edges can be removed

M. Girvan and MEJ Newman PNAS 99 (12), 7821-7826 (2002) ff

Clustering-perspective: Girvan-Newman algorithm
Divisive algorithm removing connections based on edge betweeness

Edges with higher betweenness are more likely to split the network in di erent components

The disruption of the network gives rise to a hierarchies of communities

2 communities

6 communities

9 communities

M. Girvan and MEJ Newman PNAS 99 (12), 7821-7826 (2002) ff

Clustering-perspective: Girvan-Newman algorithm
Divisive algorithm removing connections based on edge betweeness

Edges with higher betweenness are more likely to split the network in di erent components

The disruption of the network gives rise to a hierarchies of communities

2 communities

6 communities

9 communities

How can we assess how ‘good’ a

partition is to re ect the modular
M. Girvan and MEJ Newman PNAS 99 (12), 7821-7826 (2002) structure of a network?
fl
ff
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Network with N nodes and m links where C(k) denotes the community of node k
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij

Tr(e) :Fraction of links connecting nodes within communities Tr(e) =

∑
euu
u
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij

Tr(e) :Fraction of links connecting nodes within communities Tr(e) =

∑
euu
u
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij

Tr(e) :Fraction of links connecting nodes within communities Tr(e) =

∑
euu
u

Is Tr(e) a good indicator of the modular

structure of the network?
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij

Tr(e) :Fraction of links connecting nodes within communities Tr(e) =

∑
euu
u

Is Tr(e) a good indicator of the modular

structure of the network?

No, because it is maximized when we

just have a single community
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij

Tr(e) :Fraction of links connecting nodes within communities Tr(e) =

∑
euu
u

∑
au :Fraction of links in the network from community u au = euv
v
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij

Tr(e) :Fraction of links connecting nodes within communities Tr(e) =

∑
euu
u

∑
au :Fraction of links in the network from community u au = euv
v
auav : Fraction of links connecting communities u and v in a random model
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij

Tr(e) :Fraction of links connecting nodes within communities Tr(e) =

∑
euu
u

∑
au :Fraction of links in the network from community u au = euv
v
auav : Fraction of links connecting communities u and v in a random model

Modularity Q: Fraction of links within communities compared to their expected number in a random network
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij

Tr(e) :Fraction of links connecting nodes within communities Tr(e) =

∑
euu
u

∑
au :Fraction of links in the network from community u au = euv
v
auav : Fraction of links connecting communities u and v in a random model

Modularity Q: Fraction of links within communities compared to their expected number in a random network

(euu − au2)
∑
Q=
u

kikj
( 2L )
1
2L ∑
Q= Aij − δCiδCj
ij
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Modularity Q: Fraction of links within communities compared to their expected number in a random network

The de nition of modularity changes as a function of the nature of the links in the network
fi
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Modularity Q: Fraction of links within communities compared to their expected number in a random network

The de nition of modularity changes as a function of the nature of the links in the network

Unweighted undirected networks

kikj
2L ij ( 2L )
1
∑
Q= Aij − δCiδCj
fi
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Modularity Q: Fraction of links within communities compared to their expected number in a random network

The de nition of modularity changes as a function of the nature of the links in the network

Unweighted undirected networks Unweighted directed networks

kikj kiinkjout
2L ij ( 2L ) 2L ij ( 2L )
1 1
∑ ∑
Q= Aij − δCiδCj Q= Aij − δCiδCj
fi
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Modularity Q: Fraction of links within communities compared to their expected number in a random network

The de nition of modularity changes as a function of the nature of the links in the network

Unweighted undirected networks Unweighted directed networks

kikj kiinkjout
2L ij ( 2L ) 2L ij ( 2L )
1 1
∑ ∑
Q= Aij − δCiδCj Q= Aij − δCiδCj

Weighted directed networks

( )
1 sisj
∑
Q= Wij − δCiδCj
2s ij 2s
fi
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities

Modularity Q: Fraction of links within communities compared to their expected number in a random network

The de nition of modularity changes as a function of the nature of the links in the network

Unweighted undirected networks Unweighted directed networks

kikj kiinkjout
2L ij ( 2L ) 2L ij ( 2L )
1 1
∑ ∑
Q= Aij − δCiδCj Q= Aij − δCiδCj

Weighted directed networks Weighted signed networks

( )
1 sisj si+sj+ si−sj−
( 2s 2s )
1
∑
Q= Wij − δCiδCj
∑
Q= Wij − + − δCiδCj
2s ij 2s 2 (s + s ) ij
+ − +
fi
Clustering-perspective: Girvan-Newman + Modularity
Zachary-Karate club
Clustering-perspective: Girvan-Newman + Modularity
Zachary-Karate club

We observe two peaks in modularity, corresponding

to 2 and 4 communities
Clustering-perspective: Girvan-Newman + Modularity
Zachary-Karate club

We observe two peaks in modularity, corresponding

to 2 and 4 communities

There is one individual mislabeled in the network

Clustering-perspective: Modularity maximization
Reminder: Clustering perspective seeks to maximize the number of connections
within communities
Clustering-perspective: Modularity maximization
Reminder: Clustering perspective seeks to maximize the number of connections within
communities maximize the modularity function
Clustering-perspective: Modularity maximization
Reminder: Clustering perspective seeks to maximize the number of connections within
communities maximize the modularity function

The modularity function requires a given partition of the network

Clustering-perspective: Modularity maximization
Reminder: Clustering perspective seeks to maximize the number of connections within
communities maximize the modularity function

The modularity function requires a given partition of the network

We cannot maximize modularity by exploring all the partitions

Clustering-perspective: Modularity maximization
Reminder: Clustering perspective seeks to maximize the number of connections within
communities maximize the modularity function

The modularity function requires a given partition of the network

We cannot maximize modularity by exploring all the partitions

We need algorithms with some heuristics to smartly explore the partitions space
Clustering-perspective: Modularity maximization
Reminder: Clustering perspective seeks to maximize the number of connections within
communities maximize the modularity function

The modularity function requires a given partition of the network

We cannot maximize modularity by exploring all the partitions

We need algorithms with some heuristics to smartly explore the partitions space

MEJ Newman, PRE 69, 026113 (2004)

Agglomerative greedy algorithm
Clustering-perspective: Modularity maximization
Reminder: Clustering perspective seeks to maximize the number of connections within
communities maximize the modularity function

The modularity function requires a given partition of the network

We cannot maximize modularity by exploring all the partitions

We need algorithms with some heuristics to smartly explore the partitions space

MEJ Newman, PRE 69, 026113 (2004)

Agglomerative greedy algorithm
1. Start with all the nodes in its own community
Clustering-perspective: Modularity maximization
Reminder: Clustering perspective seeks to maximize the number of connections within
communities maximize the modularity function

The modularity function requires a given partition of the network

We cannot maximize modularity by exploring all the partitions

We need algorithms with some heuristics to smartly explore the partitions space

MEJ Newman, PRE 69, 026113 (2004)

Agglomerative greedy algorithm
1. Start with all the nodes in its own community
2. Merge the two communities for which the increase in
modularity is maximal (or decrease in modularity is minimal)
Clustering-perspective: Modularity maximization
Reminder: Clustering perspective seeks to maximize the number of connections within
communities maximize the modularity function

The modularity function requires a given partition of the network

We cannot maximize modularity by exploring all the partitions

We need algorithms with some heuristics to smartly explore the partitions space

MEJ Newman, PRE 69, 026113 (2004)

Agglomerative greedy algorithm
1. Start with all the nodes in its own community
2. Merge the two communities for which the increase in
modularity is maximal (or decrease in modularity is minimal)
3. Repeat step 2 until obtaining a single community
Clustering-perspective: Modularity maximization
Reminder: Clustering perspective seeks to maximize the number of connections within
communities maximize the modularity function

The modularity function requires a given partition of the network

We cannot maximize modularity by exploring all the partitions

We need algorithms with some heuristics to smartly explore the partitions space

MEJ Newman, PRE 69, 026113 (2004)

Agglomerative greedy algorithm
1. Start with all the nodes in its own community
2. Merge the two communities for which the increase in
modularity is maximal (or decrease in modularity is minimal)
3. Repeat step 2 until obtaining a single community
4. Keep the community division with highest modularity
Clustering-perspective: Modularity maximization
Reminder: Clustering perspective seeks to maximize the number of connections within
communities maximize the modularity function

The modularity function requires a given partition of the network

We cannot maximize modularity by exploring all the partitions

We need algorithms with some heuristics to smartly explore the partitions space

MEJ Newman, PRE 69, 026113 (2004) Extremal optimization

Other alternatives
Agglomerative greedy algorithm Simmulated Annealing
1. Start with all the nodes in its own community
Spectral Algorithm
2. Merge the two communities for which the increase in
modularity is maximal (or decrease in modularity is minimal) Louvain method
3. Repeat step 2 until obtaining a single community
Leiden method
4. Keep the community division with highest modularity
…
Clustering-perspective: Problems with modularity
Resolution limit Unexpected partitions
[Link] et al. PNAS 104, 36-41 (2007)

Km Km

Km Km Zachary karate club

Poor detection of small communities Q opt for four communities (GN algorithm)
Clustering-perspective: Resolution limit
Resolution limit
[Link] et al. PNAS 104, 36-41 (2007)

Km Km

Poor detection of small communities

Clustering-perspective: Resolution limit
m=5
Resolution limit Network with n cliques of size m

[Link] et al. PNAS 104, 36-41 (2007) Km

Km Km

Poor detection of small communities

Clustering-perspective: Resolution limit
m=5
Resolution limit Network with n cliques of size m

[Link] et al. PNAS 104, 36-41 (2007) Links L = nm(m − 1)/2 + n Km

Km Km

Poor detection of small communities

Clustering-perspective: Resolution limit
m=5
Resolution limit Network with n cliques of size m

[Link] et al. PNAS 104, 36-41 (2007) Links L = nm(m − 1)/2 + n Km

Km Km Easy computation of Q for di erent partitions

Km Km

Poor detection of small communities

ff
Clustering-perspective: Resolution limit
m=5
Resolution limit Network with n cliques of size m

[Link] et al. PNAS 104, 36-41 (2007) Links L = nm(m − 1)/2 + n Km

Km Km Easy computation of Q for di erent partitions

Km Km clique 2 1
Q =1− −
m(m − 1) + 2 n

Km Km

Poor detection of small communities

ff
Clustering-perspective: Resolution limit
m=5
Resolution limit Network with n cliques of size m

[Link] et al. PNAS 104, 36-41 (2007) Links L = nm(m − 1)/2 + n Km

Km Km Easy computation of Q for di erent partitions

Km Km clique 2 1
Q =1− −
m(m − 1) + 2 n
pairs 1 2
Km Km Q =1− −
m(m − 1) + 2 n

Km Km

Poor detection of small communities

ff
Clustering-perspective: Resolution limit
m=5
Resolution limit Network with n cliques of size m

[Link] et al. PNAS 104, 36-41 (2007) Links L = nm(m − 1)/2 + n Km

Km Km Easy computation of Q for di erent partitions

Km Km clique 2 1
Q =1− −
m(m − 1) + 2 n
pairs 1 2
Km Km Q =1− −
m(m − 1) + 2 n

Km Km Correct classi cation if Q clique > Q pairs

Km Km

Poor detection of small communities

fi
ff
Clustering-perspective: Resolution limit
m=5
Resolution limit Network with n cliques of size m

[Link] et al. PNAS 104, 36-41 (2007) Links L = nm(m − 1)/2 + n Km

Km Km Easy computation of Q for di erent partitions

Km Km clique 2 1
Q =1− −
m(m − 1) + 2 n
pairs 1 2
Km Km Q =1− −
m(m − 1) + 2 n

Km Km Correct classi cation if Q clique > Q pairs

Km Km Correct classi cation if m(m − 1) + 2 <n

Poor detection of small communities

fi
fi
ff
Clustering-perspective: Resolution limit
m=5
Resolution limit Network with n cliques of size m

[Link] et al. PNAS 104, 36-41 (2007) Links L = nm(m − 1)/2 + n Km

Km Km Easy computation of Q for di erent partitions

Km Km clique 2 1
Q =1− −
m(m − 1) + 2 n
pairs 1 2
Km Km Q =1− −
m(m − 1) + 2 n

Km Km Correct classi cation if Q clique > Q pairs

Km Km Correct classi cation if m(m − 1) + 2 <n

Poor detection of small communities Correct classi cation if n < 2L

fi
fi
fi
ff
Clustering-perspective: Resolution limit
m=5
Resolution limit Network with n cliques of size m

[Link] et al. PNAS 104, 36-41 (2007) Links L = nm(m − 1)/2 + n Km

Km Km Easy computation of Q for di erent partitions

Km Km clique 2 1
Q =1− −
m(m − 1) + 2 n
pairs 1 2
Km Km Q =1− −
m(m − 1) + 2 n

Km Km Correct classi cation if Q clique > Q pairs

Km Km Correct classi cation if m(m − 1) + 2 <n

Poor detection of small communities Correct classi cation if n < 2L

Bad performance for many cliques
fi
fi
fi
ff
Clustering-perspective: Multiresolution algorithms

Problem: The resolution limit from modularity comes from the lack of identi abillity of
small communities.

fi
Clustering-perspective: Multiresolution algorithms

Problem: The resolution limit from modularity comes from the lack of identi abillity of
small communities.

For small modules in a network with modular structure, the density of

connections inside modules might be lower than that between modules

fi
Clustering-perspective: Multiresolution algorithms

Problem: The resolution limit from modularity comes from the lack of identi abillity of
small communities.

For small modules in a network with modular structure, the density of

connections inside modules might be lower than that between modules

Solution to this problem: Tune artificially the number of connections in each

community adding weights in the self-loops.

fi
Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community adding weights in the self-loops.

rmax
r r r

2 4 5 2

6 8 1

r r r r

A. Arenas et al. NJP 10, 053039 (2008)

Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community adding weights in the self-loops.

r r r

2 4 5 2

6 8 1

r r r r

A. Arenas et al. NJP 10, 053039 (2008)

Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community adding weights in the self-loops.

r r r

2 4 5 2

6 8 1

r r r r

A. Arenas et al. NJP 10, 053039 (2008)

Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community adding weights in the self-loops.

r r r

2 4 5 2

6 8 1

r r r r

A. Arenas et al. NJP 10, 053039 (2008)

Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community adding weights in the self-loops.

r r r

2 4 5 2

6 8 1

r r r r

A. Arenas et al. NJP 10, 053039 (2008)

Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community adding weights in the self-loops.

r r r

2 4 5 2

6 8 1

r r r r

A. Arenas et al. NJP 10, 053039 (2008)

rmin
Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community adding weights in the self-loops.

r r r

2 4 5 2

6 8 1

r r r r

A. Arenas et al. NJP 10, 053039 (2008)

rmin
Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community adding weights in the self-loops.
Microscale

r r r

2 4 5 2

6 8 1

r r r r

A. Arenas et al. NJP 10, 053039 (2008)

rmin
Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community adding weights in the self-loops.
Microscale

r r r

2 4 5 2

6 8 1

r r r r

A. Arenas et al. NJP 10, 053039 (2008)

rmin
Macroscale
Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community
Zachary-Karate club

2
Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community
Zachary-Karate club

Optimal community: Partition remaining more stable to the addition of self-loops in

the network
Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community
Zachary-Karate club

Result at “Newman’s” resolution:

4
division in 4 (as expected)
2

Optimal community: Partition remaining more stable to the addition of self-loops in

the network
Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community
Zachary-Karate club

Result at “Newman’s” resolution:

4
division in 4 (as expected)
2 Most “stable” partition:
the one in 2 communities

Optimal community: Partition remaining more stable to the addition of self-loops in

the network
Clustering-perspective: Problems with modularity
Resolution limit Unexpected partitions
[Link] et al. PNAS 104, 36-41 (2007)

Km Km

Km Km Zachary karate club

Poor detection of small communities Q opt for four communities (GN algorithm)
Clustering-perspective: Problems with modularity
Unexpected partitions

Zachary karate club

Q opt for four communities (GN algorithm)
Clustering-perspective: Problems with modularity
Unexpected partitions

Zachary karate club

Q opt for four communities (GN algorithm)
We have seen that multiresolution algorithm based on modularity might solve the
resolution limit problem for some networks
Clustering-perspective: Problems with modularity
Unexpected partitions
Clustering-perspective: Problems with modularity
Unexpected partitions
Network 1 Network 2 Network 3

Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling
Clustering-perspective: Problems with modularity
Unexpected partitions
Network 1 Network 2 Network 3

Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling

Do these networks have modular structure?

Clustering-perspective: Problems with modularity
Unexpected partitions
Network 1 Network 2 Network 3

Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling

Network 1 seems random whereas Networks 2 & 3 seem to have modular structure
Clustering-perspective: Problems with modularity
Unexpected partitions
Network 1 Network 2 Network 3

Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling

Network 1 seems random whereas Networks 2 & 3 seem to have modular structure

But… all of them were generated following an Erdös-Rényi model with ⟨k⟩ =3
Clustering-perspective: Problems with modularity
Unexpected partitions
Network 1 Network 2 Network 3

Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling

Network 1 seems random whereas Networks 2 & 3 seem to have modular structure

But… all of them were generated following an Erdös-Rényi model with ⟨k⟩ =3
High Q values for networks generated by a model without modular structure
Clustering-perspective: Problems with modularity
Unexpected partitions
Network 1 Network 2 Network 3

Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling

Network 1 seems random whereas Networks 2 & 3 seem to have modular structure

But… all of them were generated following an Erdös-Rényi model with ⟨k⟩ =3
High Q values for networks generated by a model without modular structure
Modularity might be ill-defined
Clustering-perspective: Problems with modularity
Unexpected partitions
Network 1 Network 2 Network 3

Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling

High Q values for networks generated by a model without modular structure

These configurations are not fabricated but very unlikely to appear given their
generative model
Community detection: An open challenge
Schaub et al. Applied Network Science 2:4 (2017)

Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).

Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them

Stochastically equivalent nodes: Partitions of nodes sharing similar statistical properties

regarding their structure of connections.

Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
Generative models
Generative models

Clustering perspective
Cut-based perspective

“Frequentist” approach: We
fully rely on data, without
questioning its generative
model. We get information
from the adjacency matrix A
Generative models

Clustering perspective
Cut-based perspective

“Frequentist” approach: We
fully rely on data, without
questioning its generative
model. We get information
from the adjacency matrix A
Generative models

Clustering perspective Stochastic models

Cut-based perspective
Bayesian approach: The
“Frequentist” approach: We adjacency matrix A is just a
fully rely on data, without sample of the ensemble of
questioning its generative matrices generated by a given
model. We get information model. We obtain community
from the adjacency matrix A structure from the generative
model
Bayes theorem

Two events A and B

T. Bayes (1702-1761)
Bayes theorem

Two events A and B

P(A ∩ B) = P(A | B)P(B)

T. Bayes (1702-1761)
Bayes theorem

Two events A and B

P(A ∩ B) = P(A | B)P(B) P(A ∩ B) = P(B | A)P(A)

T. Bayes (1702-1761)
Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)
Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)

Example
What’s the probability of being a COVID-19
case given a positive test?
Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)

Example
What’s the probability of being a COVID-19
case given a positive test?

A COVID-19 case
Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)

Example
What’s the probability of being a COVID-19
case given a positive test?

A COVID-19 case
Ã Healthy individual
Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)

Example
What’s the probability of being a COVID-19
case given a positive test?

A COVID-19 case
Ã Healthy individual
B Test+
Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)

Example
What’s the probability of being a COVID-19
case given a positive test?

A COVID-19 case P(A) Seroprevalence in the population

Ã Healthy individual
B Test+
Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)

Example
What’s the probability of being a COVID-19
case given a positive test?

A COVID-19 case P(A) Seroprevalence in the population

Ã Healthy individual P(B) Probability of positive test

B Test+
Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)

Example
What’s the probability of being a COVID-19
case given a positive test?

A COVID-19 case P(A) Seroprevalence in the population

Ã Healthy individual P(B) Probability of positive test

B Test+ True positive P(B | A)P(A)

Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)

Example
What’s the probability of being a COVID-19
case given a positive test?

A COVID-19 case P(A) Seroprevalence in the population

Ã Healthy individual P(B) Probability of positive test

B Test+ True positive P(B | A)P(A)

False positive P(B | Ã) P(Ã)
Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)

Example
What’s the probability of being a COVID-19
case given a positive test?

A COVID-19 case P(A) Seroprevalence in the population

Ã Healthy individual P(B) Probability of positive test

B Test+ True positive P(B | A)P(A)

False positive P(B | Ã) P(Ã)

P(B | A) = VP/(VP + FN )
Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)

Example
What’s the probability of being a COVID-19
case given a positive test?

A COVID-19 case P(A) Seroprevalence in the population

Ã Healthy individual P(B) Probability of positive test

B Test+ True positive P(B | A)P(A)

False positive P(B | Ã) P(Ã)

P(B | A) = VP/(VP + FN )
P(B | Ã) = FP/(FP + VN )
Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)

Example
What’s the probability of being a COVID-19
case given a positive test?

A COVID-19 case P(A) Seroprevalence in the population

Ã Healthy individual P(B) Probability of positive test

B Test+ True positive P(B | A)P(A)

False positive P(B | Ã) P(Ã)
S = VP/(VP + FN) Test sensitivity
P(B | A) = VP/(VP + FN )
P(B | Ã) = FP/(FP + VN )
Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)

Example
What’s the probability of being a COVID-19
case given a positive test?

A COVID-19 case P(A) Seroprevalence in the population

Ã Healthy individual P(B) Probability of positive test

B Test+ True positive P(B | A)P(A)

False positive P(B | Ã) P(Ã)
S = VP/(VP + FN) Test sensitivity
P(B | A) = VP/(VP + FN )
E = VN/(VN + FP) Test speci city
P(B | Ã) = FP/(FP + VN )
fi
Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)

Example
What’s the probability of being a COVID-19
case given a positive test?

A COVID-19 case P(A) Seroprevalence in the population

Ã Healthy individual P(B) Probability of positive test

B Test+ True positive P(B | A)P(A)

False positive P(B | Ã) P(Ã)
S = VP/(VP + FN) Test sensitivity
P(B | A) = VP/(VP + FN ) = S
E = VN/(VN + FP) Test speci city
P(B | Ã) = FP/(FP + VN ) = 1 − E
fi
Bayes theorem

Two events A and B

P(B | A)P(A)
Bayes theorem P(A | B) =
P(B) T. Bayes (1702-1761)

Example
What’s the probability of being a COVID-19
case given a positive test?

A COVID-19 case SP(A)

P(A) Seroprevalence in the population
P(A | B) =
Ã Healthy individual SP(A) + (1
P(B) Probability of − E)(1 −
positive P(A))
test

B Test+ S = 65.3 %True 99.9 % P(B | A)P(A)

E =positive

We want to infer the most likely partition b generating the adjacency matrix A
Generative models

We want to infer the most likely partition b generating the adjacency matrix A

Memories from Bayesian inference lesson:

(A) Parameters θ ⃗ = (θ1, …, θm)

(B) Data x ⃗ = (x1, …, xn)

P( x ⃗ | ⃗ θ)⃗
θ)P(
P(θ ⃗ | x)⃗ =
P(x)⃗
Generative models

We want to infer the most likely partition b generating the adjacency matrix A

Memories from Bayesian inference lesson:

(A) Parameters b = (b1, …, bN ) (Module to which each node belongs)

(B) Data A
Generative models

We want to infer the most likely partition b generating the adjacency matrix A

Memories from Bayesian inference lesson:

(A) Parameters b = (b1, …, bN ) (Module to which each node belongs)

(B) Data A

P(A | b)P(b)
P(b | A) =
P(A)
Generative models
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

Generative models
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

Stochastic Block model (SBM): The generative model is a matrix p which
gives us the probability of a link connecting blocks r and s
Generative models
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

Stochastic Block model (SBM): The generative model is a matrix p which
gives us the probability of a link connecting blocks r and s

∫
P(A | b) = P(A | p, b)P(p | b)dp
Generative models
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

Stochastic Block model (SBM): The generative model is a matrix p which
gives us the probability of a link connecting blocks r and s

∫
P(A | b) = P(A | p, b)P(p | b)dp
Example matrix p
Generative models
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

Stochastic Block model (SBM): The generative model is a matrix p which
gives us the probability of a link connecting blocks r and s

∫
P(A | b) = P(A | p, b)P(p | b)dp
Example matrix p
One network generated
from p
Stochastic block model

Example matrix p
One network generated
from p
Stochastic block model

Example matrix p
One network generated
from p

P(A | p, b) ?
Stochastic block model

Example matrix p
One network generated
from p

P(A | p, b) ?
Aij
− pbi,bj)1−Aij
∏ i j
P(A | p, b) = pb ,b (1 Stochastic Block model (SBM)
i<j
Stochastic block model

Example matrix p
One network generated
from p

P(A | p, b) ?
Aij
− pbi,bj)1−Aij
∏ i j
P(A | p, b) = pb ,b (1 Stochastic Block model (SBM)
i<j
−λbi,bj Aij
e λbi,bj e −λbi,bi /2(λbi,bi /2)Aii /2 SBM for network with
∏ ∏
P(A | λ, b) = multiedges and self-loops
i<j
Aij ! i
(Aii /2)!
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A

For the rest of quantities, we will omit the maths but explain the assumptions
nedeed to construct each term
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

We want to introduce a new matrix λ which gives us the expected number of
connections between blocks r and s

P(b | A) ∝ P(A | b)P(b)

We want to introduce a new matrix λ which gives us the expected number of
connections between blocks r and s

∫
P(A | b) = P(A | λ, b)P(λ | b)dλ
For the rest of quantities, we will omit the maths but explain the assumptions
nedeed to construct each term
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

We want to introduce a new matrix λ which gives us the expected number of
connections between blocks r and s

∫
P(A | b) = P(A | λ, b)P(λ | b)dλ
For the rest of quantities, we will omit the maths but explain the assumptions
nedeed to construct each term

P(λ | b)
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

We want to introduce a new matrix λ which gives us the expected number of
connections between blocks r and s

∫
P(A | b) = P(A | λ, b)P(λ | b)dλ
For the rest of quantities, we will omit the maths but explain the assumptions
nedeed to construct each term

P(λ | b)

P(b)
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

We want to introduce a new matrix λ which gives us the expected number of
connections between blocks r and s

∫
P(A | b) = P(A | λ, b)P(λ | b)dλ
For the rest of quantities, we will omit the maths but explain the assumptions
nedeed to construct each term

P(λ | b) Maximum entropy distribution satisfying that the total number of

connections in the network should be equal to the observed one

P(b)
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

We want to introduce a new matrix λ which gives us the expected number of
connections between blocks r and s

∫
P(A | b) = P(A | λ, b)P(λ | b)dλ
For the rest of quantities, we will omit the maths but explain the assumptions
nedeed to construct each term

P(λ | b) Maximum entropy distribution satisfying that the total number of

connections in the network should be equal to the observed one

P(b) Hierarchical prior with three levels: Uninformed number of modules

B, sizes of each group n and partitions b
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

Combining all the information explained before we obtain:

ers : Exact number of edges

P(A | b)P(b) = P(A | e, b)P(e, b) between blocks r and s in data
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

Combining all the information explained before we obtain:

ers : Exact number of edges

P(A | b)P(b) = P(A | e, b)P(e, b) between blocks r and s in data

B P(A | e, b)
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

Combining all the information explained before we obtain:

ers : Exact number of edges

P(A | b)P(b) = P(A | e, b)P(e, b) between blocks r and s in data

B P(A | e, b)
B P(e, b)
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

Combining all the information explained before we obtain:

ers : Exact number of edges

P(A | b)P(b) = P(A | e, b)P(e, b) between blocks r and s in data

B P(A | e, b)
B P(e, b)
Optimal value B of the number of
modules grounded in Bayesian
probability
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

Combining all the information explained before we obtain:

ers : Exact number of edges

P(A | b)P(b) = P(A | e, b)P(e, b) between blocks r and s in data

B P(A | e, b) P(A | b)P(b) = 2−Σ

B P(e, b) Σ : Description length (bits)

Optimal value B of the number of

modules grounded in Bayesian
probability
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A

P(b | A) ∝ P(A | b)P(b)

Combining all the information explained before we obtain:

ers : Exact number of edges

P(A | b)P(b) = P(A | e, b)P(e, b) between blocks r and s in data

B P(A | e, b) P(A | b)P(b) = 2−Σ

B P(e, b) Σ : Description length (bits)

Optimal value B of the number of Bayesian inference minimizes

modules grounded in Bayesian the amount of information
probability needed to describe data
Stochastic block model
Nodes: Teams in Division-1 college football competition in the US
Links: Matches between teams
Stochastic block model
Nodes: Teams in Division-1 college football competition in the US
Links: Matches between teams

SBM optimizes the number of modules to B = 10 matching conferences

Stochastic block model
Nodes: Teams in Division-1 college football competition in the US
Links: Matches between teams

SBM optimizes the number of modules to B = 10 matching conferences

Some teams are misclassi ed due to the high frequency of interconference
matches
fi
Stochastic block model
Network: Zachary Karate Club network

b0 Con guration model

b1 Leader+closest follower,
rest of the group
b2 Expected partition
fi
Stochastic block model
Network: Zachary Karate Club network

b0 Con guration model

b1 Leader+closest follower,
rest of the group
b2 Expected partition

Most parsimonious
generative model:
Con guration model
fi
fi
Stochastic block model
Network: Zachary Karate Club network

b0 Con guration model

b1 Leader+closest follower,
rest of the group
b2 Expected partition

Most parsimonious
generative model:
Con guration model

But enough evidence not

to discard the existence of
communities
fi
fi
Stochastic block model vs Modularity optimization
Modularity optimization SBM
Stochastic block model vs Modularity optimization
Modularity optimization SBM
Stochastic block model vs Modularity optimization
Modularity optimization SBM

The SBM nds the most parsimonious block partition of the network,
without the need for having more internal than external connections
fi
Degree-corrected SBM
The SBM assumes that, within each module, each node is equally connected
Degree corrected SBM: Accounts for possible heterogeneities inside each
group with a parameter θi controlling the expected degree of each node i

Nodes: Political blogs during US 2004 elections

Links: Mentions (url) of one blog on the other

Partition according to the SBM Partition according to the DC-SBM

Nested SBM
The SBM (or DC-SBM) only explores communities at one resolution level

For a more hierarchical picture of the di erent group structures at di erent scales,
we should use the Nested SBM

Nested SBM
1. Apply the DC-SBM to the original
network to get a block structure e with B0
modules
2. Assume that the block structure of the
previous hierarchical level l − 1 is a new
network
3. Apply DC-SBM to get the block
structure at level l
4. Repeat from step 2 until a single block
is found
ff
ff
Nested SBM
Nodes: People posting tweets during Catalonia con ict in 2017
Links: Retweets

J. Atienza-Barthelemy. Sci. Rep. 9,

17148 (2019).
fl
Community detection: An open challenge
Schaub et al. Applied Network Science 2:4 (2017)

Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).

Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them

Stochastically equivalent nodes: Partitions of nodes sharing similar statistical properties

regarding their structure of connections.

Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
Dynamical perspective: Infomap
Dynamical perspective: Groups of nodes with similar dynamical function with respect
to the rest of the network
Random walk trajectory in a network

M. Rosvall and C. Bergstrom PNAS 105 (4), 1118-1123 (2008)

Dynamical perspective: Infomap
Dynamical perspective: Groups of nodes with similar dynamical function with respect
to the rest of the network
Random walk trajectory in a network

How can we describe the

trajectory of the random
walker with the minimum
description length?

M. Rosvall and C. Bergstrom PNAS 105 (4), 1118-1123 (2008)

Dynamical perspective: Infomap
Dynamical perspective: Groups of nodes with similar dynamical function with respect
to the rest of the network
Random walk trajectory in a network Assign a unique number to each node

M. Rosvall and C. Bergstrom PNAS 105 (4), 1118-1123 (2008)

L = 243 bits
Dynamical perspective: Infomap
Dynamical perspective: Groups of nodes with similar dynamical function with respect
to the rest of the network
Random walk trajectory in a network Assign a unique number to each module
Reuse nodes numbers

M. Rosvall and C. Bergstrom PNAS 105 (4), 1118-1123 (2008)

To de ne the number of modules and its composition, the authors use a greedy
agglomerative algorithm based on the description length of the trajectory
fi
Community detection: An open challenge
Schaub et al. Applied Network Science 2:4 (2017)

Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).

Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them

Stochastically equivalent nodes: Partitions of nodes sharing similar statistical properties

regarding their structure of connections.

Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
Community detection: An open challenge
Schaub et al. Applied Network Science 2:4 (2017)

Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).

Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them

Stochastically equivalent nodes: Partitions of nodes sharing similar statistical properties

regarding their structure of connections.

Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
Limitations of community detection
A. Decelle et al. Phys. Rev. Lett. 107 065701 (2011)
Detectability limit
F. Radicchi. EPL 106 38001 (2014)
Limitations of community detection
A. Decelle et al. Phys. Rev. Lett. 107 065701 (2011)
Detectability limit
F. Radicchi. EPL 106 38001 (2014)

For extremely sparse networks, the community structure becomes undetectable

even with statistically grounded formalism such as SBM
Limitations of community detection
A. Decelle et al. Phys. Rev. Lett. 107 065701 (2011)
Detectability limit
F. Radicchi. EPL 106 38001 (2014)

For extremely sparse networks, the community structure becomes undetectable

even with statistically grounded formalism such as SBM

The existence of a few connections enhances the role of uctuations and makes
the actual community structure of the network indistinguishable from that expected
from null models

fl
Limitations of community detection
A. Decelle et al. Phys. Rev. Lett. 107 065701 (2011)
Detectability limit
F. Radicchi. EPL 106 38001 (2014)

For extremely sparse networks, the community structure becomes undetectable

even with statistically grounded formalism such as SBM

The existence of a few connections enhances the role of uctuations and makes
the actual community structure of the network indistinguishable from that expected
from null models

Metadata as ground-truth communities?

fl
Limitations of community detection
A. Decelle et al. Phys. Rev. Lett. 107 065701 (2011)
Detectability limit
F. Radicchi. EPL 106 38001 (2014)

For extremely sparse networks, the community structure becomes undetectable

even with statistically grounded formalism such as SBM

The existence of a few connections enhances the role of uctuations and makes
the actual community structure of the network indistinguishable from that expected
from null models

Metadata as ground-truth communities?

Di erent reasons, not related to the algorithm itself, may explain a bad performance
of an algorithm in detecting communities:
ff
fl
Limitations of community detection
A. Decelle et al. Phys. Rev. Lett. 107 065701 (2011)
Detectability limit
F. Radicchi. EPL 106 38001 (2014)

For extremely sparse networks, the community structure becomes undetectable

even with statistically grounded formalism such as SBM

The existence of a few connections enhances the role of uctuations and makes
the actual community structure of the network indistinguishable from that expected
from null models

Metadata as ground-truth communities?

Di erent reasons, not related to the algorithm itself, may explain a bad performance
of an algorithm in detecting communities:
Metadata is irrelevant for network structure
ff
fl
Limitations of community detection
A. Decelle et al. Phys. Rev. Lett. 107 065701 (2011)
Detectability limit
F. Radicchi. EPL 106 38001 (2014)

For extremely sparse networks, the community structure becomes undetectable

even with statistically grounded formalism such as SBM

The existence of a few connections enhances the role of uctuations and makes
the actual community structure of the network indistinguishable from that expected
from null models

Metadata as ground-truth communities?

Di erent reasons, not related to the algorithm itself, may explain a bad performance
of an algorithm in detecting communities:
Metadata is irrelevant for network structure
Metadata and communities capture different aspects in the network
ff
fl
Limitations of community detection
A. Decelle et al. Phys. Rev. Lett. 107 065701 (2011)
Detectability limit
F. Radicchi. EPL 106 38001 (2014)

For extremely sparse networks, the community structure becomes undetectable

even with statistically grounded formalism such as SBM

The existence of a few connections enhances the role of uctuations and makes
the actual community structure of the network indistinguishable from that expected
from null models

Metadata as ground-truth communities?

Di erent reasons, not related to the algorithm itself, may explain a bad performance
of an algorithm in detecting communities:
Metadata is irrelevant for network structure
Metadata and communities capture different aspects in the network
The network lacks group structure
ff
fl
Limitations of community detection

Metadata as ground-truth communities?

Sometimes, relying on the algorithms allows us to learn new information about the
network
Limitations of community detection

Metadata as ground-truth communities?

Sometimes, relying on the algorithms allows us to learn new information about the
network

Zachary, 1977:

'' He was a weak supporter of John but joined Mr. Hi's club
after the split. This can be explained by noting that he was
only three weeks away from a test for black belt
(master status) when the split in the club occurred.
Had he joined the o cers' club he would have had to
give up his rank and begin again in a new style of
karate with a white (beginner's) belt, since the o cers
had decided to change the style of karate practiced in their
new club. Having four years of study invested in the style of
Mr. Hi, the individual could not bring himself to repudiate his
rank and start again. ''
ffi
ffi

Sna Unit III
No ratings yet
Sna Unit III
10 pages
Community Structure in Large Networks
No ratings yet
Community Structure in Large Networks
30 pages
Social Network Security 2 Marks
No ratings yet
Social Network Security 2 Marks
3 pages
Gionis
No ratings yet
Gionis
191 pages
Understanding Network Structure and Metrics
No ratings yet
Understanding Network Structure and Metrics
29 pages
Social Network Analysis Unit-3
No ratings yet
Social Network Analysis Unit-3
28 pages
Newman 2004
No ratings yet
Newman 2004
15 pages
Community Detection Algorithms Explained
No ratings yet
Community Detection Algorithms Explained
109 pages
Unit III Brief
No ratings yet
Unit III Brief
21 pages
A Divisive Hierarchical Structural Clustering Algorithm For Networks
No ratings yet
A Divisive Hierarchical Structural Clustering Algorithm For Networks
6 pages
Complex Networks Course Overview
No ratings yet
Complex Networks Course Overview
46 pages
Community Structure in Social and Biological Networks: M. Girvan and M. E. J. Newman
No ratings yet
Community Structure in Social and Biological Networks: M. Girvan and M. E. J. Newman
6 pages
Web Social Network Community Mining
No ratings yet
Web Social Network Community Mining
19 pages
04 Communities
No ratings yet
04 Communities
78 pages
Persistence and Periodicity in A Dynamic Proximity Network
No ratings yet
Persistence and Periodicity in A Dynamic Proximity Network
5 pages
A Classification of Well-Behaved Graph Clustering Schemes: Vilhelm Agdur April 5, 2024
No ratings yet
A Classification of Well-Behaved Graph Clustering Schemes: Vilhelm Agdur April 5, 2024
16 pages
Complex Network Theory CS60078: Department of Computer Science & Engineering, IIT Kharagpur
No ratings yet
Complex Network Theory CS60078: Department of Computer Science & Engineering, IIT Kharagpur
36 pages
Network Theory Basics
No ratings yet
Network Theory Basics
66 pages
Wa0007.
No ratings yet
Wa0007.
25 pages
Graph Sample (Part 1)
No ratings yet
Graph Sample (Part 1)
15 pages
EECS 203: Discrete Math Discussion Notes
No ratings yet
EECS 203: Discrete Math Discussion Notes
10 pages
Community Detection and Evaluation
No ratings yet
Community Detection and Evaluation
46 pages
Understanding Complex Networks
No ratings yet
Understanding Complex Networks
90 pages
Networks BigData 1
No ratings yet
Networks BigData 1
43 pages
PPT10-W10-Graph Analytics For Big Data
No ratings yet
PPT10-W10-Graph Analytics For Big Data
55 pages
Community Analysis in Large Networks
No ratings yet
Community Analysis in Large Networks
30 pages
Radicchi 2004
No ratings yet
Radicchi 2004
6 pages
Community Structure Identification in Graphs
No ratings yet
Community Structure Identification in Graphs
10 pages
Graphs and Relations Ann
No ratings yet
Graphs and Relations Ann
62 pages
Graph Theory Cambridge U
No ratings yet
Graph Theory Cambridge U
75 pages
Basics of Network Analysis
No ratings yet
Basics of Network Analysis
38 pages
1904 02690 PDF
No ratings yet
1904 02690 PDF
48 pages
First Half Scribe
No ratings yet
First Half Scribe
33 pages
Newman Survey
No ratings yet
Newman Survey
91 pages
Graph Theory in Economics & Finance
No ratings yet
Graph Theory in Economics & Finance
24 pages
Module 3
No ratings yet
Module 3
36 pages
Intro to Graph Theory Concepts
No ratings yet
Intro to Graph Theory Concepts
29 pages
Networks Sna
No ratings yet
Networks Sna
126 pages
Community Moore
No ratings yet
Community Moore
6 pages
PHD Defence Slides
No ratings yet
PHD Defence Slides
86 pages
Graph Connectivity Explained
No ratings yet
Graph Connectivity Explained
6 pages
2 Grouping
No ratings yet
2 Grouping
43 pages
WSC Week4 Community
No ratings yet
WSC Week4 Community
77 pages
Understanding Clustering in Networks
No ratings yet
Understanding Clustering in Networks
38 pages
1 s2.0 S0370157309002841 Main PDF
No ratings yet
1 s2.0 S0370157309002841 Main PDF
100 pages
Newman - The Structure and Function of Complex Networks
No ratings yet
Newman - The Structure and Function of Complex Networks
90 pages
Newman 2003 Structure
No ratings yet
Newman 2003 Structure
90 pages
Fafl Assignment Name: Prabhakar Gouda Assignment No: 01,02,03 Roll No: 35 Week No: 01 SEM: V Cs&E Section: B
No ratings yet
Fafl Assignment Name: Prabhakar Gouda Assignment No: 01,02,03 Roll No: 35 Week No: 01 SEM: V Cs&E Section: B
12 pages
Network Communities and Affiliation Networks-1
No ratings yet
Network Communities and Affiliation Networks-1
20 pages
Newman The Structure and Function of Complex Networks
No ratings yet
Newman The Structure and Function of Complex Networks
90 pages
DM UNIT4 Notes
No ratings yet
DM UNIT4 Notes
8 pages
Graph Theory Training for Teachers
100% (2)
Graph Theory Training for Teachers
47 pages
Topological Characteristics of IP Networks: Hamed Haddadi
No ratings yet
Topological Characteristics of IP Networks: Hamed Haddadi
4 pages
3.2 Detecting Communities in Social Networks
No ratings yet
3.2 Detecting Communities in Social Networks
6 pages
Communities and Bottlenecks Trees and Treelike Networks Have High Modularity
No ratings yet
Communities and Bottlenecks Trees and Treelike Networks Have High Modularity
9 pages
Training 2024
No ratings yet
Training 2024
20 pages
FALLSEM2018-19 - CSE3021 - ETH - SJT824 - VL2018191006149 - Reference Material I - Module3 - CommunityNetworks1
No ratings yet
FALLSEM2018-19 - CSE3021 - ETH - SJT824 - VL2018191006149 - Reference Material I - Module3 - CommunityNetworks1
98 pages
Graph Theory 1-11 PDF
No ratings yet
Graph Theory 1-11 PDF
13 pages
Data Analytics Made Accessible
100% (7)
Data Analytics Made Accessible
296 pages
Density Gradient Estimation Methods
No ratings yet
Density Gradient Estimation Methods
9 pages
Intro To Comp
No ratings yet
Intro To Comp
59 pages
Eti 22618
0% (1)
Eti 22618
11 pages
A Survey of Internet of Things and Cyber-Physical
No ratings yet
A Survey of Internet of Things and Cyber-Physical
20 pages
Clustering: Unsupervised Learning Methods 15-381
No ratings yet
Clustering: Unsupervised Learning Methods 15-381
25 pages
Beyond The Worst-Case Analysis of Algorithms 1st Edition Tim Roughgarden PDF Available
100% (1)
Beyond The Worst-Case Analysis of Algorithms 1st Edition Tim Roughgarden PDF Available
57 pages
Literature Review On Distribution Channel PDF
100% (3)
Literature Review On Distribution Channel PDF
6 pages
Alleviating The Data Sparsity Problem of Recommender Systems by Clustering Nodes in Bipartite Networks
No ratings yet
Alleviating The Data Sparsity Problem of Recommender Systems by Clustering Nodes in Bipartite Networks
10 pages
Machine Learning
No ratings yet
Machine Learning
256 pages
Understanding Mobile Ecosystem Dynamics
No ratings yet
Understanding Mobile Ecosystem Dynamics
13 pages
AI 2020 21 Samplepaper Answerkey
No ratings yet
AI 2020 21 Samplepaper Answerkey
19 pages
Introduction To Data Visualization
No ratings yet
Introduction To Data Visualization
10 pages
AI Project Cycle Overview
No ratings yet
AI Project Cycle Overview
17 pages
Clustering Technique in Writing Recount Text
No ratings yet
Clustering Technique in Writing Recount Text
30 pages
Module 5 - Clustering - Afterclassb
No ratings yet
Module 5 - Clustering - Afterclassb
49 pages
Agricultural Crop Yield
No ratings yet
Agricultural Crop Yield
7 pages
Tunisia Forest Loss Post-Revolution
No ratings yet
Tunisia Forest Loss Post-Revolution
14 pages
Data Reduction Techniques
100% (1)
Data Reduction Techniques
41 pages
بنك اسئلة للمراجعة - تنقيب بيانات
No ratings yet
بنك اسئلة للمراجعة - تنقيب بيانات
24 pages
K Means Cluster Analysis in SPSS
0% (1)
K Means Cluster Analysis in SPSS
2 pages
Debrah Artificial Intelligence Building
No ratings yet
Debrah Artificial Intelligence Building
64 pages
AI in AEC: Review & Future Outlook
No ratings yet
AI in AEC: Review & Future Outlook
18 pages
Gyan Mitra
No ratings yet
Gyan Mitra
50 pages
Bda Mini-Project Report Be-Comps B-50,51,58
No ratings yet
Bda Mini-Project Report Be-Comps B-50,51,58
24 pages
Data Mining: Process and Techniques
No ratings yet
Data Mining: Process and Techniques
45 pages
Kartika - ICFAS 2022 (Terbit Feb 2024)
No ratings yet
Kartika - ICFAS 2022 (Terbit Feb 2024)
18 pages
Konsta, 2022
No ratings yet
Konsta, 2022
29 pages
Ai For Leaders Course
No ratings yet
Ai For Leaders Course
16 pages
PR Unit 1 2
No ratings yet
PR Unit 1 2
40 pages