Community Detection
Community Detection
structure of networks
David Soriano Paños
So far in the course Microscopic indicators
So far in the course Microscopic indicators
- Degree
So far in the course Microscopic indicators
- Degree
- Clustering
So far in the course Microscopic indicators
- Degree
- Clustering
- Betweenness
…
So far in the course Microscopic indicators
- Degree
- Clustering
- Betweenness
…
Models
So far in the course Microscopic indicators
- Degree
- Clustering
- Betweenness
…
Models
- ER model
So far in the course Microscopic indicators
- Degree
- Clustering
- Betweenness
…
Models
- ER model
- BA model
So far in the course Microscopic indicators
- Degree
- Clustering
- Betweenness
…
Models
- ER model
- BA model
- Con guration model
fi
So far in the course Microscopic indicators
- Degree
- Clustering
- Betweenness
…
Models
- ER model
- BA model
- Con guration model
- WS model
fi
Scales in networks Microscopic indicators
- Degree
- Clustering
- Betweenness
…
Models
- ER model
- BA model
- Con guration model
- WS model
fi
Scales in networks Microscopic indicators
- Degree
- Clustering
- Betweenness
…
Models
- ER model
- BA model
- Con guration model
- WS model
fi
Scales in networks Microscopic indicators
- Degree
- Clustering
- Betweenness
…
Models
- ER model
- BA model
- Con guration model
- WS model
Community structure
fi
Real networks have modular structure
instructor
Communities: The division in two new clubs after a ght between the
owner and the instructor
fi
No clear de nition of what’s a community
Schaub et al. Applied Network Science 2:4 (2017)
Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).
Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them
Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
fi
No clear de nition of what’s a community
Schaub et al. Applied Network Science 2:4 (2017)
Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).
Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them
Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
fi
No clear de nition of what’s a community
Schaub et al. Applied Network Science 2:4 (2017)
Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).
Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them
Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
fi
No clear de nition of what’s a community
Schaub et al. Applied Network Science 2:4 (2017)
Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).
Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them
Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset
2
1
4
3
5
6
7
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset
2
1
4
3
5
6
7
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset
2 2
1 1
4 4
3 3
5 5
6 6
7 7
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset
2 2 2
1 1 1
4 4 4
3 3 3
5 5 5
6 6 6
7 7 7
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset
2 2 2
1 1 1
4 4 4
3 3 3
5 5 5
6 6 6
7 7 7 …
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset
2 2 2
1 1 1
4 4 4
3 3 3
5 5 5
6 6 6
7 7 7 …
Number of all the possible partitions of a network of n nodes in k subsets grows faster
than exponentially according to the Bell number
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset
2 2 2
1 1 1
4 4 4
3 3 3
5 5 5
6 6 6
7 7 7 …
Number of all the possible partitions of a network of n nodes in k subsets grows faster
than exponentially according to the Bell number
1 ∞ kn
e∑
Bn =
k=0
k!
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset
2 2 2
1 1 1
4 4 4
3 3 3
5 5 5
6 6 6
7 7 7 …
Number of all the possible partitions of a network of n nodes in k subsets grows faster
than exponentially according to the Bell number
1 ∞ kn {B} = 1,2,5,15,52,203,877…
e∑
Bn =
k=0
k!
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset
2 2 2
1 1 1
4 4 4
3 3 3
5 5 5
6 6 6
7 7 7 …
Number of all the possible partitions of a network of n nodes in k subsets grows faster
than exponentially according to the Bell number
1 ∞ kn {B} = 1,2,5,15,52,203,877…
e∑
Bn =
k=0
k! 877 con gurations for a
tiny network of 7 nodes!
fi
fi
Is it trivial to nd partitions in the network?
A partition of a network is a collection of subsets of nodes such that every node
belongs to one subset
2 2 2
1 1 1
4 4 4
3 3 3
5 5 5
6 6 6
7 7 7 …
Number of all the possible partitions of a network of n nodes in k subsets grows faster
than exponentially according to the Bell number
e∑
Bn = uncover the modular structure of
k=0
k! 877 con gurations for a
tiny network of 7 nodes! networks
fi
ffi
fi
Community detection: An open challenge
Schaub et al. Applied Network Science 2:4 (2017)
Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).
Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them
Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
Cut-based perspective
Problems:
De nition of the number of partitions
(Otherwise a single partition)
Problems:
De nition of the number of partitions
(Otherwise a single partition)
De nition of the minimum size of the
partitions
B.W. Kernigham et al. Bell Syst. Tech. 49, 291–307. (1970)
(Otherwise single-node partitions)
fi
fi
Cut-based perspective
Problems:
De nition of the number of partitions
(Otherwise a single partition)
De nition of the minimum size of the
partitions
B.W. Kernigham et al. Bell Syst. Tech. 49, 291–307. (1970)
(Otherwise single-node partitions)
Applications:
fi
fi
Cut-based perspective
Problems:
De nition of the number of partitions
(Otherwise a single partition)
De nition of the minimum size of the
partitions
B.W. Kernigham et al. Bell Syst. Tech. 49, 291–307. (1970)
(Otherwise single-node partitions)
Applications:
Circuit layout and design
fi
fi
Cut-based perspective
Problems:
De nition of the number of partitions
(Otherwise a single partition)
De nition of the minimum size of the
partitions
B.W. Kernigham et al. Bell Syst. Tech. 49, 291–307. (1970)
(Otherwise single-node partitions)
Applications:
Circuit layout and design
Parallelization of processes in scienti c computing
fi
fi
fi
Community detection: An open challenge
Schaub et al. Applied Network Science 2:4 (2017)
Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).
Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them
Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
Clustering-perspective: Hierarchical clustering
Clustering-perspective: Hierarchical clustering
Approach: Group nodes according to their similarity in terms of network attributes
Clustering-perspective: Hierarchical clustering
Approach: Group nodes according to their similarity in terms of network attributes
Bottom-up approach
Start from N communities of a single node
Group closest nodes sequentially
Clustering-perspective: Hierarchical clustering
Approach: Group nodes according to their similarity in terms of network attributes
Many algorithms as a function of how we quantify distance between individual nodes and
how we measure distance to a cluster of nodes
Clustering-perspective: Hierarchical clustering
Many algorithms as a function of how we quantify distance between individual nodes and
how we measure distance to a cluster of nodes
Outcome:
Source: [Link]
Clustering-perspective: Hierarchical clustering
Many algorithms as a function of how we quantify distance between individual nodes and
how we measure distance to a cluster of nodes
Outcome: Problem:
Many algorithms as a function of how we quantify distance between individual nodes and
how we measure distance to a cluster of nodes
Outcome: Problem:
Source: [Link]
fi
Clustering-perspective: Girvan-Newman algorithm
Clustering-perspective: Girvan-Newman algorithm
Divisive algorithm removing connections based on edge betweeness
Clustering-perspective: Girvan-Newman algorithm
Divisive algorithm removing connections based on edge betweeness
Edges with higher betweenness are more likely to split the network in di erent components
Edges with higher betweenness are more likely to split the network in di erent components
Edges with higher betweenness are more likely to split the network in di erent components
Girvan-Newman algorithm
Edges with higher betweenness are more likely to split the network in di erent components
Girvan-Newman algorithm
Edges with higher betweenness are more likely to split the network in di erent components
Girvan-Newman algorithm
Edges with higher betweenness are more likely to split the network in di erent components
Girvan-Newman algorithm
Edges with higher betweenness are more likely to split the network in di erent components
Girvan-Newman algorithm
Edges with higher betweenness are more likely to split the network in di erent components
Girvan-Newman algorithm
Edges with higher betweenness are more likely to split the network in di erent components
2 communities
6 communities
9 communities
Edges with higher betweenness are more likely to split the network in di erent components
2 communities
6 communities
9 communities
Network with N nodes and m links where C(k) denotes the community of node k
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities
Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities
Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij
Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij
Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij
Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij
Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij
∑
au :Fraction of links in the network from community u au = euv
v
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities
Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij
∑
au :Fraction of links in the network from community u au = euv
v
auav : Fraction of links connecting communities u and v in a random model
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities
Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij
∑
au :Fraction of links in the network from community u au = euv
v
auav : Fraction of links connecting communities u and v in a random model
Modularity Q: Fraction of links within communities compared to their expected number in a random network
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities
Network with N nodes and m links where C(k) denotes the community of node k
Aij
euv :Fraction of links in the network connecting communities u and v euv = ∑ 2L δCivδCj w
ij
∑
au :Fraction of links in the network from community u au = euv
v
auav : Fraction of links connecting communities u and v in a random model
Modularity Q: Fraction of links within communities compared to their expected number in a random network
(euu − au2)
∑
Q=
u
kikj
( 2L )
1
2L ∑
Q= Aij − δCiδCj
ij
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities
Modularity Q: Fraction of links within communities compared to their expected number in a random network
The de nition of modularity changes as a function of the nature of the links in the network
fi
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities
Modularity Q: Fraction of links within communities compared to their expected number in a random network
The de nition of modularity changes as a function of the nature of the links in the network
Modularity Q: Fraction of links within communities compared to their expected number in a random network
The de nition of modularity changes as a function of the nature of the links in the network
Modularity Q: Fraction of links within communities compared to their expected number in a random network
The de nition of modularity changes as a function of the nature of the links in the network
( )
1 sisj
∑
Q= Wij − δCiδCj
2s ij 2s
fi
Clustering-perspective: Modularity function
Reminder: Clustering perspective seeks to maximize the number of connections
inside communities
Modularity Q: Fraction of links within communities compared to their expected number in a random network
The de nition of modularity changes as a function of the nature of the links in the network
( )
1 sisj si+sj+ si−sj−
( 2s 2s )
1
∑
Q= Wij − δCiδCj
∑
Q= Wij − + − δCiδCj
2s ij 2s 2 (s + s ) ij
+ − +
fi
Clustering-perspective: Girvan-Newman + Modularity
Zachary-Karate club
Clustering-perspective: Girvan-Newman + Modularity
Zachary-Karate club
We need algorithms with some heuristics to smartly explore the partitions space
Clustering-perspective: Modularity maximization
Reminder: Clustering perspective seeks to maximize the number of connections within
communities maximize the modularity function
We need algorithms with some heuristics to smartly explore the partitions space
We need algorithms with some heuristics to smartly explore the partitions space
We need algorithms with some heuristics to smartly explore the partitions space
We need algorithms with some heuristics to smartly explore the partitions space
We need algorithms with some heuristics to smartly explore the partitions space
We need algorithms with some heuristics to smartly explore the partitions space
Other alternatives
Agglomerative greedy algorithm Simmulated Annealing
1. Start with all the nodes in its own community
Spectral Algorithm
2. Merge the two communities for which the increase in
modularity is maximal (or decrease in modularity is minimal) Louvain method
3. Repeat step 2 until obtaining a single community
Leiden method
4. Keep the community division with highest modularity
…
Clustering-perspective: Problems with modularity
Resolution limit Unexpected partitions
[Link] et al. PNAS 104, 36-41 (2007)
Km Km
Km Km
Km Km
Km Km
Poor detection of small communities Q opt for four communities (GN algorithm)
Clustering-perspective: Resolution limit
Resolution limit
[Link] et al. PNAS 104, 36-41 (2007)
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km
Km Km clique 2 1
Q =1− −
m(m − 1) + 2 n
Km Km
Km Km
Km Km
Km Km clique 2 1
Q =1− −
m(m − 1) + 2 n
pairs 1 2
Km Km Q =1− −
m(m − 1) + 2 n
Km Km
Km Km
Km Km clique 2 1
Q =1− −
m(m − 1) + 2 n
pairs 1 2
Km Km Q =1− −
m(m − 1) + 2 n
Km Km clique 2 1
Q =1− −
m(m − 1) + 2 n
pairs 1 2
Km Km Q =1− −
m(m − 1) + 2 n
Km Km clique 2 1
Q =1− −
m(m − 1) + 2 n
pairs 1 2
Km Km Q =1− −
m(m − 1) + 2 n
Km Km clique 2 1
Q =1− −
m(m − 1) + 2 n
pairs 1 2
Km Km Q =1− −
m(m − 1) + 2 n
Problem: The resolution limit from modularity comes from the lack of identi abillity of
small communities.
fi
Clustering-perspective: Multiresolution algorithms
Problem: The resolution limit from modularity comes from the lack of identi abillity of
small communities.
fi
Clustering-perspective: Multiresolution algorithms
Problem: The resolution limit from modularity comes from the lack of identi abillity of
small communities.
fi
Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community adding weights in the self-loops.
rmax
r r r
10
2 4 5 2
6 8 1
r r r r
r r r
10
2 4 5 2
6 8 1
r r r r
r r r
10
2 4 5 2
6 8 1
r r r r
r r r
10
2 4 5 2
6 8 1
r r r r
r r r
10
2 4 5 2
6 8 1
r r r r
r r r
10
2 4 5 2
6 8 1
r r r r
r r r
10
2 4 5 2
6 8 1
r r r r
r r r
10
2 4 5 2
6 8 1
r r r r
r r r
10
2 4 5 2
6 8 1
r r r r
2
Clustering-perspective: Multiresolution algorithms
Solution to this problem: Tune artificially the number of connections in each
community
Zachary-Karate club
Km Km
Km Km
Km Km
Km Km
Poor detection of small communities Q opt for four communities (GN algorithm)
Clustering-perspective: Problems with modularity
Unexpected partitions
Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling
Clustering-perspective: Problems with modularity
Unexpected partitions
Network 1 Network 2 Network 3
Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling
Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling
Clustering-perspective: Problems with modularity
Unexpected partitions
Network 1 Network 2 Network 3
Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling
Network 1 seems random whereas Networks 2 & 3 seem to have modular structure
Clustering-perspective: Problems with modularity
Unexpected partitions
Network 1 Network 2 Network 3
Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling
Network 1 seems random whereas Networks 2 & 3 seem to have modular structure
But… all of them were generated following an Erdös-Rényi model with ⟨k⟩ =3
Clustering-perspective: Problems with modularity
Unexpected partitions
Network 1 Network 2 Network 3
Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling
Network 1 seems random whereas Networks 2 & 3 seem to have modular structure
But… all of them were generated following an Erdös-Rényi model with ⟨k⟩ =3
High Q values for networks generated by a model without modular structure
Clustering-perspective: Problems with modularity
Unexpected partitions
Network 1 Network 2 Network 3
Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling
Network 1 seems random whereas Networks 2 & 3 seem to have modular structure
But… all of them were generated following an Erdös-Rényi model with ⟨k⟩ =3
High Q values for networks generated by a model without modular structure
Modularity might be ill-defined
Clustering-perspective: Problems with modularity
Unexpected partitions
Network 1 Network 2 Network 3
Peixoto, T.P. (2019). Bayesian Stochastic Blockmodeling. In Advances in Network Clustering and Blockmodeling
Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).
Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them
Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
Generative models
Generative models
Clustering perspective
Cut-based perspective
“Frequentist” approach: We
fully rely on data, without
questioning its generative
model. We get information
from the adjacency matrix A
Generative models
Clustering perspective
Cut-based perspective
“Frequentist” approach: We
fully rely on data, without
questioning its generative
model. We get information
from the adjacency matrix A
Generative models
T. Bayes (1702-1761)
Bayes theorem
Example
What’s the probability of being a COVID-19
case given a positive test?
Bayes theorem
Example
What’s the probability of being a COVID-19
case given a positive test?
A COVID-19 case
Bayes theorem
Example
What’s the probability of being a COVID-19
case given a positive test?
A COVID-19 case
à Healthy individual
Bayes theorem
Example
What’s the probability of being a COVID-19
case given a positive test?
A COVID-19 case
à Healthy individual
B Test+
Bayes theorem
Example
What’s the probability of being a COVID-19
case given a positive test?
Example
What’s the probability of being a COVID-19
case given a positive test?
B Test+
Bayes theorem
Example
What’s the probability of being a COVID-19
case given a positive test?
Example
What’s the probability of being a COVID-19
case given a positive test?
Example
What’s the probability of being a COVID-19
case given a positive test?
P(B | A) = VP/(VP + FN )
Bayes theorem
Example
What’s the probability of being a COVID-19
case given a positive test?
P(B | A) = VP/(VP + FN )
P(B | Ã) = FP/(FP + VN )
Bayes theorem
Example
What’s the probability of being a COVID-19
case given a positive test?
Example
What’s the probability of being a COVID-19
case given a positive test?
Example
What’s the probability of being a COVID-19
case given a positive test?
Example
What’s the probability of being a COVID-19
case given a positive test?
| B) P(B | Ã)%P(Ã)
S = VP/(VP + FN) Test sensitivity P(A) = 1 %False P(A
positive = 87
E = VN/(VN + FP) Test speci city P(B
P(A) = 2%| A) = VP/(VP
P(A | B)+=FN
93) %
P(B | Ã) = FP/(FP + VN )
fi
Generative models
We want to infer the most likely partition b generating the adjacency matrix A
Generative models
We want to infer the most likely partition b generating the adjacency matrix A
P( x ⃗ | ⃗ θ)⃗
θ)P(
P(θ ⃗ | x)⃗ =
P(x)⃗
Generative models
We want to infer the most likely partition b generating the adjacency matrix A
We want to infer the most likely partition b generating the adjacency matrix A
P(A | b)P(b)
P(b | A) =
P(A)
Generative models
We want to infer the most likely partition b⃗ generating the adjacency matrix A
∫
P(A | b) = P(A | p, b)P(p | b)dp
Generative models
We want to infer the most likely partition b⃗ generating the adjacency matrix A
∫
P(A | b) = P(A | p, b)P(p | b)dp
Example matrix p
Generative models
We want to infer the most likely partition b⃗ generating the adjacency matrix A
∫
P(A | b) = P(A | p, b)P(p | b)dp
Example matrix p
One network generated
from p
Stochastic block model
Example matrix p
One network generated
from p
Stochastic block model
Example matrix p
One network generated
from p
P(A | p, b) ?
Stochastic block model
Example matrix p
One network generated
from p
P(A | p, b) ?
Aij
− pbi,bj)1−Aij
∏ i j
P(A | p, b) = pb ,b (1 Stochastic Block model (SBM)
i<j
Stochastic block model
Example matrix p
One network generated
from p
P(A | p, b) ?
Aij
− pbi,bj)1−Aij
∏ i j
P(A | p, b) = pb ,b (1 Stochastic Block model (SBM)
i<j
−λbi,bj Aij
e λbi,bj e −λbi,bi /2(λbi,bi /2)Aii /2 SBM for network with
∏ ∏
P(A | λ, b) = multiedges and self-loops
i<j
Aij ! i
(Aii /2)!
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A
For the rest of quantities, we will omit the maths but explain the assumptions
nedeed to construct each term
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A
For the rest of quantities, we will omit the maths but explain the assumptions
nedeed to construct each term
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A
For the rest of quantities, we will omit the maths but explain the assumptions
nedeed to construct each term
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A
∫
P(A | b) = P(A | λ, b)P(λ | b)dλ
For the rest of quantities, we will omit the maths but explain the assumptions
nedeed to construct each term
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A
∫
P(A | b) = P(A | λ, b)P(λ | b)dλ
For the rest of quantities, we will omit the maths but explain the assumptions
nedeed to construct each term
P(λ | b)
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A
∫
P(A | b) = P(A | λ, b)P(λ | b)dλ
For the rest of quantities, we will omit the maths but explain the assumptions
nedeed to construct each term
P(λ | b)
P(b)
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A
∫
P(A | b) = P(A | λ, b)P(λ | b)dλ
For the rest of quantities, we will omit the maths but explain the assumptions
nedeed to construct each term
P(b)
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A
∫
P(A | b) = P(A | λ, b)P(λ | b)dλ
For the rest of quantities, we will omit the maths but explain the assumptions
nedeed to construct each term
B P(A | e, b)
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A
B P(A | e, b)
B P(e, b)
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A
B P(A | e, b)
B P(e, b)
Optimal value B of the number of
modules grounded in Bayesian
probability
Stochastic block model
We want to infer the most likely partition b⃗ generating the adjacency matrix A
b1 Leader+closest follower,
rest of the group
b2 Expected partition
fi
Stochastic block model
Network: Zachary Karate Club network
b1 Leader+closest follower,
rest of the group
b2 Expected partition
Most parsimonious
generative model:
Con guration model
fi
fi
Stochastic block model
Network: Zachary Karate Club network
b1 Leader+closest follower,
rest of the group
b2 Expected partition
Most parsimonious
generative model:
Con guration model
The SBM nds the most parsimonious block partition of the network,
without the need for having more internal than external connections
fi
Degree-corrected SBM
The SBM assumes that, within each module, each node is equally connected
Degree corrected SBM: Accounts for possible heterogeneities inside each
group with a parameter θi controlling the expected degree of each node i
For a more hierarchical picture of the di erent group structures at di erent scales,
we should use the Nested SBM
Nested SBM
1. Apply the DC-SBM to the original
network to get a block structure e with B0
modules
2. Assume that the block structure of the
previous hierarchical level l − 1 is a new
network
3. Apply DC-SBM to get the block
structure at level l
4. Repeat from step 2 until a single block
is found
ff
ff
Nested SBM
Nodes: People posting tweets during Catalonia con ict in 2017
Links: Retweets
Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).
Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them
Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
Dynamical perspective: Infomap
Dynamical perspective: Groups of nodes with similar dynamical function with respect
to the rest of the network
Random walk trajectory in a network
L = 243 bits
Dynamical perspective: Infomap
Dynamical perspective: Groups of nodes with similar dynamical function with respect
to the rest of the network
Random walk trajectory in a network Assign a unique number to each module
Reuse nodes numbers
L = 243 bits
Dynamical perspective: Infomap
Dynamical perspective: Groups of nodes with similar dynamical function with respect
to the rest of the network
Random walk trajectory in a network Assign a unique number to each module
Reuse nodes numbers
To de ne the number of modules and its composition, the authors use a greedy
agglomerative algorithm based on the description length of the trajectory
fi
Community detection: An open challenge
Schaub et al. Applied Network Science 2:4 (2017)
Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).
Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them
Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
Community detection: An open challenge
Schaub et al. Applied Network Science 2:4 (2017)
Cut based perspective: Roughly equally sized groups with minimal number of connections
established among them (regardless of the density of internal connections).
Clustering perspective: Partitions in the network such as the density of internal links is much
higher than the number of connections established among them
Dynamical perspective: Groups of nodes with similar dynamical function with respect to the rest
of the network
Limitations of community detection
A. Decelle et al. Phys. Rev. Lett. 107 065701 (2011)
Detectability limit
F. Radicchi. EPL 106 38001 (2014)
Limitations of community detection
A. Decelle et al. Phys. Rev. Lett. 107 065701 (2011)
Detectability limit
F. Radicchi. EPL 106 38001 (2014)
The existence of a few connections enhances the role of uctuations and makes
the actual community structure of the network indistinguishable from that expected
from null models
fl
Limitations of community detection
A. Decelle et al. Phys. Rev. Lett. 107 065701 (2011)
Detectability limit
F. Radicchi. EPL 106 38001 (2014)
The existence of a few connections enhances the role of uctuations and makes
the actual community structure of the network indistinguishable from that expected
from null models
fl
Limitations of community detection
A. Decelle et al. Phys. Rev. Lett. 107 065701 (2011)
Detectability limit
F. Radicchi. EPL 106 38001 (2014)
The existence of a few connections enhances the role of uctuations and makes
the actual community structure of the network indistinguishable from that expected
from null models
The existence of a few connections enhances the role of uctuations and makes
the actual community structure of the network indistinguishable from that expected
from null models
The existence of a few connections enhances the role of uctuations and makes
the actual community structure of the network indistinguishable from that expected
from null models
The existence of a few connections enhances the role of uctuations and makes
the actual community structure of the network indistinguishable from that expected
from null models
Zachary, 1977:
'' He was a weak supporter of John but joined Mr. Hi's club
after the split. This can be explained by noting that he was
only three weeks away from a test for black belt
(master status) when the split in the club occurred.
Had he joined the o cers' club he would have had to
give up his rank and begin again in a new style of
karate with a white (beginner's) belt, since the o cers
had decided to change the style of karate practiced in their
new club. Having four years of study invested in the style of
Mr. Hi, the individual could not bring himself to repudiate his
rank and start again. ''
ffi
ffi