12/4/24, 8:08 AM Stochastic Blockmodel.
ipynb - Colab
keyboard_arrow_down The Stochastic Blockmodel
import networkx as nx
import numpy as np
import [Link] as plt
import sympy
keyboard_arrow_down The simplest community: a clique
G_clique = nx.from_edgelist([(i,j) for i in range(10) for j in range(10) if i!=j])
[Link](G_clique, pos=nx.circular_layout(G_clique))
# The adjacency matrix is (almost) all ones
A_clique = nx.adjacency_matrix(G_clique).todense()
[Link](A_clique)
# Visualize the adjacency matrix
[Link](A_clique)
[Link]()
[Link] 1/11
12/4/24, 8:08 AM Stochastic [Link] - Colab
<[Link] at 0x167fb20bc20>
keyboard_arrow_down Two communities: the Caveman graph
G_caveman = nx.from_edgelist([(i,j) for i in range(20) for j in range(20) if i!=j and (i-10)*(j-10)>0])
A_caveman = nx.adjacency_matrix(G_caveman).todense()
[Link](A_caveman)
<[Link] at 0x167fca4fec0>
keyboard_arrow_down Suppose you didn't know who lived in which cave
In other words, the nodes were in some random order
[Link](42)
order = [Link](len(A_caveman))
A_caveman2 = A_caveman[order,:][:,order]
[Link](A_caveman2)
[Link] 2/11
12/4/24, 8:08 AM Stochastic [Link] - Colab
<[Link] at 0x167fcad7c80>
keyboard_arrow_down How can we figure out which nodes are in the same cave?
Let's look at a few rows of the adjacency matrix.
fig, axes = [Link](nrows=3, ncols=2, figsize=(12,2))
for i, r in enumerate([0, 2, 4, 5, 15, 12]):
col = i % 2
row = int(i/2)
axes[row, col].matshow(A_caveman2[r:r+1])
axes[row, col].set_title(f"Row {r}")
axes[row, col].[Link].set_ticks([])
axes[row, col].[Link].set_ticks([])
Idea: Run K-Nearest Neighbors clustering with these rows as the feature vectors
It would group rows 0/4/15 into one cluster, and 2/5/12 into another
Clusters = Communities
(We'll improve upon this idea later)
keyboard_arrow_down How would we check if the communities were good?
SUPPOSE someone told us here are the communities.
Maybe by doing K-Nearest Neighbors.
How would we check?
We would reorder the nodes by grouping people from the same club together
Then, we would look at the new adjacency matrix
[Link](order>=10)[0]
[Link] 3/11
12/4/24, 8:08 AM Stochastic [Link] - Colab
array([ 2, 5, 7, 8, 9, 12, 14, 16, 17], dtype=int64)
someone_says_community1 = [0, 1, 3, 4, 6, 10, 11, 13, 15, 18]
someone_says_community2 = [2, 5, 7, 8, 9, 12, 14, 16, 17]
# Reordering the nodes
ordering = [Link]([someone_says_community1, someone_says_community2])
ordering
array([ 0, 1, 3, 4, 6, 10, 11, 13, 15, 18, 2, 5, 7, 8, 9, 12, 14,
16, 17])
# Adjacency matrix with new ordering
A_caveman2_ordered = A_caveman2[ordering][:, ordering]
# What does the reordered adjacency matrix look like?
[Link](A_caveman2_ordered)
<[Link] at 0x167fcb2acf0>
SUPPOSE someone told us here are the communities. How would we check?
IF the memberships are correct, the reordered adjacency matrix is block-structured.
Note: Whether the big block is first or the small block doesn't matter.
keyboard_arrow_down Stochastic Blockmodel: Generalizing the caveman graph
First, we will restate what we did in the caveman graph using new terminology.
n = 10 # number of nodes
# Each node can belong to one of two clubs
clubs = [Link](2, size=n)
clubs
array([1, 1, 1, 0, 0, 1, 1, 1, 0, 1])
# interests = club memberships matrix
interests = [Link]((n, 2))
interests[[Link](n), clubs] = 1
interests
array([[0., 1.],
[0., 1.],
[0., 1.],
[1., 0.],
[1., 0.],
[0., 1.],
[Link] 4/11
12/4/24, 8:08 AM Stochastic [Link] - Colab
[0., 1.],
[0., 1.],
[1., 0.],
[0., 1.]])
Each row represents one person
The first number is the person's interest in club #1.
The second number is the interest in club #2.
keyboard_arrow_down From interests to network
Fans of club #1 become friends
Fans of club #2 become friends
club1_fans = interests[:,0] # Everyone's interest in club #1
club1_fans
array([0., 0., 0., 1., 1., 0., 0., 0., 1., 0.])
# A1[i,j] = club1_fans[i] * club1_fans[j]
A1 = [Link](club1_fans, club1_fans)
[Link](A1)
<[Link] at 0x167fde95e50>
club2_fans = interests[:,1] # fans of club #2
club2_fans
array([1., 1., 1., 0., 0., 1., 1., 1., 0., 1.])
A2 = [Link](club2_fans, club2_fans)
[Link](A2)
[Link] 5/11
12/4/24, 8:08 AM Stochastic [Link] - Colab
<[Link] at 0x167fdf3c410>
# All friendships together gives the adjacency matrix A
A = A1 + A2
[Link](A)
<[Link] at 0x167fca65460>
keyboard_arrow_down From interests to network (in one step)
# Same thing, without all the intermediate steps
A = interests @ interests.T # Matrix multiplication
[Link](A)
[Link] 6/11
12/4/24, 8:08 AM Stochastic [Link] - Colab
<[Link] at 0x167fdf3f9b0>
keyboard_arrow_down From network to interests
You see the network. How can you figure out the club memberships?
⇒ To find the right memberships, we need to find the ordering that makes the matrix block-structured.
keyboard_arrow_down Method #1: Communities via modularity
G = nx.from_numpy_array(A)
# Communities via modularity
communities = [Link].louvain_communities(G)
communities
[{0, 1, 2, 5, 6, 7, 9}, {3, 4, 8}]
# Check if the communities give a block-structured matrix
ordering = [Link]([list(x) for x in communities])
[Link](A[ordering][:, ordering])
<[Link] at 0x167fe47d100>
[Link] 7/11
12/4/24, 8:08 AM Stochastic [Link] - Colab
keyboard_arrow_down Method #2: Communities via spectral decomposition
eigenvalues, eigenvectors = [Link](A)
eigenvalues
array([-4.88399708e-16, -4.17365745e-16, -4.44899761e-17, -5.41731251e-34,
3.81435172e-19, 3.71371796e-18, 2.61415001e-16, 4.46874861e-16,
3.00000000e+00, 7.00000000e+00])
The eigenvalues are returned in ascending order.
The two largest ones are 7 and 3; the rest are pretty much 0
Let's look at the last two eigenvectors
fig, ax = [Link](figsize=(10,5))
[Link](eigenvectors[:,-2:])
<[Link] at 0x167fca66300>
Each row corresponds to one node.
Clearly, the rows are of two types
$\Rightarrow$ K-Nearest Neighbors clustering of the rows of the eigenvector matrix
Previously for the caveman: we clustered the rows of the adjacency matrix
from [Link] import KMeans
model = KMeans(n_clusters=2)
[Link](eigenvectors[:,-2:]) # K-Means on the eigenvector rows
predicted_clubs = model.labels_ # Clusters found by K-Means
predicted_clubs
C:\Users\deepay\Miniconda\Lib\site-packages\sklearn\cluster\_kmeans.py:1446: UserWarning: KMeans is known to have a memory l
[Link](
array([0, 0, 0, 1, 1, 0, 0, 0, 1, 0])
predicted_club1_members = [Link](predicted_clubs==0)[0]
predicted_club2_members = [Link](predicted_clubs==1)[0]
print('Predicted clubs', predicted_club1_members, 'and', predicted_club2_members)
Predicted clubs [0 1 2 5 6 7 9] and [3 4 8]
ordering = [Link]([predicted_club1_members, predicted_club2_members])
[Link](A[ordering][:, ordering])
[Link] 8/11
12/4/24, 8:08 AM Stochastic [Link] - Colab
<[Link] at 0x16780407fb0>
keyboard_arrow_down Generalization
Story so far:
all fans of the same club become friends
fans of different clubs do not become friends.
Generalization:
Fans of the same club become friends with probability 0.8 (say)
Fans of different clubs become friends with probability 0.10 (say)
# B = cluster-connection matrix
B = [Link]([[0.8, 0.1], [0.1, 0.8]])
[Link](B)
Previously: A = interests @ interests.T # Adjacency matrix
This gave us the caveman graph
Now: We create a probability matrix, from which we sample the adjacency matrix
P = interests @ B @ interests.T # Probability matrix depends on the cluster-connection matrix B
P
array([[0.8, 0.8, 0.8, 0.1, 0.1, 0.8, 0.8, 0.8, 0.1, 0.8],
[0.8, 0.8, 0.8, 0.1, 0.1, 0.8, 0.8, 0.8, 0.1, 0.8],
[0.8, 0.8, 0.8, 0.1, 0.1, 0.8, 0.8, 0.8, 0.1, 0.8],
[0.1, 0.1, 0.1, 0.8, 0.8, 0.1, 0.1, 0.1, 0.8, 0.1],
[0.1, 0.1, 0.1, 0.8, 0.8, 0.1, 0.1, 0.1, 0.8, 0.1],
[0.8, 0.8, 0.8, 0.1, 0.1, 0.8, 0.8, 0.8, 0.1, 0.8],
[0.8, 0.8, 0.8, 0.1, 0.1, 0.8, 0.8, 0.8, 0.1, 0.8],
[0.8, 0.8, 0.8, 0.1, 0.1, 0.8, 0.8, 0.8, 0.1, 0.8],
[0.1, 0.1, 0.1, 0.8, 0.8, 0.1, 0.1, 0.1, 0.8, 0.1],
[0.8, 0.8, 0.8, 0.1, 0.1, 0.8, 0.8, 0.8, 0.1, 0.8]])
A = [Link](1, P) # Friendships are random
A
array([[1, 0, 1, 0, 0, 1, 0, 1, 0, 1],
[1, 1, 0, 0, 0, 1, 1, 1, 0, 1],
[0, 1, 1, 0, 0, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0, 1, 0],
[1, 1, 1, 0, 0, 1, 0, 1, 1, 0],
[1, 1, 1, 0, 0, 1, 1, 1, 0, 0],
[Link] 9/11
12/4/24, 8:08 AM Stochastic [Link] - Colab
[0, 1, 1, 0, 0, 1, 1, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 1, 1, 0],
[0, 1, 1, 0, 0, 1, 1, 0, 0, 1]])
[Link](A)
<[Link] at 0x167808be510>
keyboard_arrow_down Does the block-structure still apply?
ordering = [Link]([[Link](clubs==0)[0], [Link](clubs==1)[0]]) # Actual communities
[Link](A[ordering][:, ordering])
<[Link] at 0x167808e0950>
Still roughly block structured.
In the caveman graph, it was exactly block-structured.
keyboard_arrow_down Finding the Communities
Same ideas as before.
keyboard_arrow_down Method #1: Modularity
[Link] 10/11
12/4/24, 8:08 AM Stochastic [Link] - Colab
G = nx.from_numpy_array(A)
communities = [Link].louvain_communities(G)
communities
[{2}, {3, 4, 8}, {7}, {0, 1, 5, 6, 9}]
Got split into too many communities
# Play with the resolution to get the desired number of communities
communities = [Link].louvain_communities(G, resolution=0.5)
communities
[{3, 4, 8}, {0, 1, 2, 5, 6, 7, 9}]
ordering = [Link]([list(x) for x in communities])
[Link](A[ordering][:, ordering])
<[Link] at 0x167809c6510>
[Link] 11/11