SMA Module2B
SMA Module2B
Analytics
Module 2
Network Measures
Overview - Network Measures
Properties of Real-World Networks –
High Average Local Clustering
Coefficient, Small-world Property,
Scale-free Property.
Random Network Model- Degree
Distribution of Random Network,
Evolution of a Random Network,
Average Path Length, Clustering
Coefficient, Random Network vs.
Real-world Network
Network Models
• Social network models are theoretical frameworks that
represent and analyze the patterns of relationships and
interactions among entities
• Can we characterize, model, and reason about the structure of
social networks?
• Utilizing network models in platforms like Facebook, Instagram,
etc. is essential for comprehending and optimizing the intricate
web of user interactions and relationships.
• These models serve as foundational tools for analyzing the
structure and dynamics of social networks
Challenges
• Utilizing network models instead of analyzing the complete real-world social networks
directly offers several advantages, particularly in the context of platforms like Facebook
and Instagram
• Scalability and Complexity Management:
– Real-world social networks are vast and intricate, comprising billions of users and
connections.
– Direct analysis of such massive networks is computationally intensive and often
impractical.
– Network models abstract essential features, enabling researchers to study the
network's properties without the need to process the entire dataset.
• Privacy and Ethical Considerations:
– Analyzing real user data raises significant privacy concerns.
– Network models allow for the study of social dynamics without exposing personal
information, as they can generate synthetic data that mirrors the statistical properties of
real networks.
– This approach facilitates research while safeguarding user privacy.
Challenges
• Hypothesis Testing and Simulation:
–Researchers can manipulate model parameters to observe potential
outcomes, such as the spread of information or the impact of removing
certain nodes, which would be challenging to perform on the actual network.
• Generalization and Theory Development:
–By focusing on models, researchers can identify underlying principles and
patterns that apply across different networks.
• Data Availability and Accessibility:
–Access to complete real-world network data is often restricted due to
proprietary or privacy reasons.
–Network models, however, can be constructed using available data samples or
inferred characteristics, making it feasible to conduct meaningful analyses
without requiring full access to the original network.
So, what do we do?
Design models that generate graphs
– The generated graphs should be similar to
real-world networks.
• If we can guarantee that generated graphs
are similar to real-world networks:
1. We can analyze simulated graphs instead of
real-networks (cost-efficient)
2. We can better understand real-world
networks by providing concrete
mathematical explanations; and
3. We can perform controlled experiments on
synthetic networks when real-world Basic Intuition:
Hopefully! Our complex output [social network] is
networks are unavailable. generated by a simple process
Synthetic Networks (Models)
•Small-world property
•High average local clustering
coefficient
•Scale-free property
Properties of Real-world Networks
• Small-World Phenomenon:
– Most individuals are connected by short chains of acquaintances, leading to the
"six degrees of separation" concept.
– This property facilitates rapid information dissemination across the network.
• Scale-Free Distribution:
– These networks often have a few highly connected nodes (hubs) and many nodes
with fewer connections, following a power-law distribution.
– This structure makes the network robust against random failures but vulnerable to
targeted attacks on hubs.
• High Clustering Coefficient:
– Individuals tend to form tightly knit groups where members are more
interconnected, resulting in a higher probability that two friends of a person are
also friends with each other.
Properties of Real-world Networks
• Community Structure:
– Social networks naturally divide into communities or clusters with dense internal
connections and sparser links between groups. This modularity reflects shared
interests or affiliations among members.
• Assortative Mixing:
– Nodes with similar attributes, such as age, profession, or socioeconomic status, are
more likely to connect with each other, leading to homophily within the network.
• Dynamic Evolution:
– Social networks are not static; they evolve over time as individuals form new
connections or sever existing ones, influenced by factors like changing interests,
life events, or technological advancements.
Small-world Property
An outcome of Stanley Milgram’s small-world
experiment (1967) to measure the probability
of two random persons being known to each
other
Can also be viewed in light of the average
path length between two randomly chosen
nodes in a network
A network 𝐺 is said to follow the small-world
property if the average path length of the The "six degrees of
network is logarithmically proportional to the separation" model
https://en.wikipedia.org/wiki/Small-
network size. world_experiment
𝑨𝒗𝒆𝒓𝒂𝒈𝒆 𝑷𝒂𝒕𝒉 𝑳𝒆𝒏𝒈𝒕𝒉 ∝ 𝐥𝐨𝐠(𝑵𝒆𝒕𝒘𝒐𝒓𝒌 𝑺𝒊𝒛𝒆)
How Small is the World?
• A rumor is spreading over a social
network.
–Assume all users pass it immediately to Rumor Spreading on Facebook
all of their friends
Approximately 2.32
Instagram 0.42
steps
High (specific value not
LinkedIn Approximately 3.5 steps
provided)
Low to Moderate
Twitter Approximately 4.8 steps (specific value not
provided)
Scale-free Property: Power Law
Power Law: a relative change in the value of one
variable leads to the proportional change in the value
of other variable
Independent of the initial values of both the variable
Mathematically,
𝑦 ∝ 𝑥 −𝑏 , 𝑏 ∈ ℝ
Functions that follows power-law are scale-invariant
Pareto Principle (or the 80/20 rule) in Economics: 80%
of the outcomes are results of 20% of the causes
Power law principle is also coined as Pareto
https://exploitingchange.com/2016/09/14/a Distribution
nother-powerful-idea-the-power-law/
Distributions – Real Examples
• Wealth Distribution:
– Most individuals have average capitals,
– Few are considered wealthy.
– Exponentially more individuals with average
capital than the wealthier ones.
• City Population:
– A few metropolitan areas are densely populated
– Most cities have an average population size. Herbert A Simon,
On a Class of Skew Distribution Functions, 1955
• Social Media:
– We observe the same phenomenon regularly The Pareto principle
(80–20 rule): 80% of the effects come
when measuring popularity or interestingness for from 20% of the causes
entities.
Distributions
• Site Popularity:
– Many sites are visited less than a 1,000 times a month
– A few are visited more than a million times daily
• User Activity:
– Social media users are often active on a few sites
– Some individuals are active on hundreds of sites
• Product Price:
– There are exponentially more modestly priced products for sale compared to
expensive ones.
• Friendships:
– Many individuals with a few friends and a handful of users with thousands of
friends
– (Degree Distribution)
Simon Model
• Herbert Simon is widely recognized for his significant contribution to understanding
power laws in social networks, particularly through his "Simon model" which
explains how a system with preferential attachment can generate power-law
distributions
• A small number of nodes in a network hold a disproportionate amount of
connections, mirroring phenomena observed in real social networks
• Essentially, the more connected a node is, the more likely it is to gain further
connections over time.
• Key points about Herbert Simon and power laws:
• The Simon Model describes a mechanism where new elements (like individuals in a
social network) are added to a system with a higher probability of attaching to
already well-connected elements, leading to a power-law distribution of
connections.
Simon Model
• Application to Language: Simon initially applied his model to explain the
frequency distribution of words in a text, where common words appear
much more frequently than rare words, following a power law pattern.
• Concept of "Preferential Attachment": A key element of the Simon model
is the concept of preferential attachment, meaning that new connections
are more likely to be made to nodes with existing high connectivity.
• Relevance to Social Networks: By applying the Simon model to social
networks, researchers can explain why a small number of individuals in a
social group tend to have significantly more connections than the
majority.
Scale Free Networks
OR
The total sales volume of unpopular
By a much larger population of items, taken together, is very
items that are each individually significant.
57% of Amazon’s sales is
less popular? from the long tail
Power Law in Scale-Free Social Networks
• A scale-free network is a type of network where a few nodes (users) have a
massive number of connections, while most nodes have only a few. This follows
a power-law degree distribution.
• Examples :
–Facebook & Instagram – Celebrities and influencers (e.g., Cristiano Ronaldo,
Virat kohli) have millions of followers, while most users have only a few
hundred.
–Twitter (X) – A few accounts (e.g., Elon Musk, Narendra Modi) dominate
conversations with millions of followers, while most users have a small
audience.
–LinkedIn – Thought leaders and CEOs (e.g., Satya Nadella) have vast
professional networks, while most professionals have relatively small ones.
Impact
• Virality & Influence: A tweet or post from a highly connected user can
reach millions instantly.
• Misinformation Spread: Fake news can spread quickly when influential
users share it.
• Targeted Marketing: Companies can use power-law properties to focus
on influential users for advertising.
• By understanding scale-free networks, social media platforms can design
better recommendation systems, control misinformation, and optimize
network structures.
Impact of the Long Tail
• The Long Tail allows niche creators to find an audience despite not being top influencers.
– Example: On Instagram, a small travel blogger with 5K followers might not compete with
Virat Kohli but can still attract a dedicated niche audience.
– Platforms like YouTube, Instagram, and Facebook encourage niche content discovery using
recommendation algorithms.
• Brands increasingly target micro- and nano-influencers rather than only top celebrities.
– Example: Local businesses in India prefer collaborating with regional influencers rather
than Bollywood stars.
• Network Resilience & Decentralization
– Unlike scale-free networks dominated by hubs, the Long Tail provides redundancy and
prevents total collapse if a few key users (hubs) leave.
– Example: On Twitter (X), even if Elon Musk deletes his account, millions of niche
conversations will still continue.
• Platforms like Facebook and LinkedIn monetize engagement from the Long Tail by:
– Personalized ads for niche communities
– Subscriptions (e.g., LinkedIn Premium for professionals in small industries)
Network Models
• A basic problem in network modeling is to define a random process that
generates networks that are similar to those in the real world.
• The simplest model:
• Suppose we want a network with N nodes and E edges
–Create a graph with N nodes
– For every pair of nodes (i, j), connect them with probability p
– If we want the expected number of edges to be E, then we should set
𝑛
• Any of the 2
edges can be formed independently, with probability p
• Similarities:
–In the limit (when 𝑛 is large), both 𝐺(𝑛, 𝑝) and 𝐺(𝑛, 𝑚) models
act similarly
𝑛
• The expected number of edges in 𝐺(𝑛, 𝑝) is 2
𝑝
𝑛
• We can set 2
𝑝 = 𝑚 and in the limit, we should get similar results
• Differences:
–The 𝐺(𝑛, 𝑚) model contains a fixed number of edges
–The 𝐺(𝑛, 𝑝) model is likely to contain none or all possible edges
Degree Distribution
• This is a binomial degree distribution. In the limit this will become the
Poisson degree distribution
Erdős-Rényi Network: Degree Distribution
For a node in 𝐺(𝑁, 𝑝) to have degree 𝑘, the
corresponding node must be adjacent to 𝑘 other nodes
of the network
𝑁−1
We can choose these nodes in 𝑘
ways
𝑁−1 𝑘
𝑃 𝑘 = 𝑝 (1 − 𝑝)𝑁−1−𝑘
𝑘
Letting 𝑁 → ∞,
𝑒− 𝑑 𝑑 𝑘
Comparison of degree distribution 𝑃 𝑘 =
𝑘!
Instagram – Degree Distribution
Degree distribution of the Twitter (X) friendship graph.
Erdős-Rényi Networks vs. Real-life Networks:
Presence of Outliers
In an Erdős-Rényi Network,
probability of having a node
with a high degree is
extremely low
probability of a node with
2000 neighbours is 10−27 !
In real-world networks, such
nodes exist (Hubs)
Celebrities in social networks
Erdős-Rényi Network: Average Path Length
When 𝑙𝑚𝑎𝑥 represent the maximum path length of
𝐺(𝑁, 𝑝),
1+ 𝑑 + 𝑑 2 + 𝑑 3 + ⋯⋯⋯+ 𝑑 𝑙𝑚𝑎𝑥 =𝑁
In 𝐺 𝑁, 𝑝
𝑑
Number of possible edges between neighbours of a node: 2
𝑑
Expected number of edges between these nodes: 𝑝 × 2
The above yields, the local clustering coefficient for a node 𝑣𝑖 ∈ 𝐺
𝑑
𝐶𝑖 = 𝑝 ≈
𝑁
Theorem: The local clustering coefficient for any node in an Erdős-Rényi Network is
inversely proportional to the size of the network
Note: The local clustering coefficient for any node in an Erdős-Rényi Network does not
depend on the degree of the node
Erdős-Rényi Networks vs. Real-life Networks:
Clustering Coefficient
Avg. Clustering
Network
Coefficient
local clustering coefficients in Erdős-Rényi
ER Random Graph Networks decreases with the increase in
~ 0.001
(N=10K, p=0.001) network size
Facebook (global
~ 0.61 Nodes having high local clustering coefficient
network)
exist in enormously large real-life networks
Instagram ~ 0.55
Diameter 0 2 6 1
• Phase Transition: the point where diameter value starts to shrink in a random
graph
–We have other phase transitions in random graphs
• E.g., when the graph becomes connected
• The phase transition we focus on happens when
–average node degree 𝑐 = 1 (or when 𝑝 = 1/(𝑛 − 1) )
• At this Phase Transition:
1. The giant component, which just started to appear, starts to grow, and
2. The diameter, which just reached its maximum value, starts decreasing.
Random Graphs
If 𝒄 < 𝟏:
– small, isolated clusters
– small diameters
– short path lengths
1.0
At 𝒄 = 𝟏:
– a giant component appears
– diameter peaks
– path lengths are long
For 𝒄 > 𝟏:
– almost all nodes connected 0 1.0 c
– diameter shrinks
– path lengths shorten
phase transition
Modeling with Random Graphs
• Nodes with higher degrees are more likely to attract new connections
(rich-get-richer effect).
• Degree distribution follows a power law: few nodes with very high
degrees, many nodes with low degrees.
• Results in a scale-free network with highly connected hubs.
• Found in social networks like Facebook, Twitter, and LinkedIn, where
influencers or popular accounts have significantly more connections.
• Enables robust connectivity but can be vulnerable to targeted attacks on
hubs.
Barabasi-Albert Network Model
• Real-life social networks often evolve with time
• Originates with a small seed network
• The network grows as new nodes and edges gets attached to the
network with time
• Barabasi-Albert model follows the same principle of network
evolution
• Also known as Preferential Attachment model or rich gets richer
model
• Generates scale-free networks
Preferential Attachment: Example
• Node 𝑣 arrives
𝑃(1) = 1/7 𝑃(5) = 3/7
𝑣
• 𝑃(4) =0
• 𝑃(5) = 3/7 2 3 4
Barabasi-Albert Model: Network Growth
𝐺 = 20 𝐺 = 50 𝐺 = 100
Average Path Length = 2.16 Average Path Length = 2.69 Average Path Length = 3.02
• https://www.researchgate.net/publication/259742981_Information_Theory_Kolmogorov_Complexity_and_Algorithmic_Probability_in_Network_Biology
Barabási-Albert (BA) Watts-Strogatz (WS)
Property Real Social Networks
Model Model
No, follows a more
Power Law Degree Yes, follows power-law Yes, follows power-law
uniform degree
Distribution distribution distribution
distribution
High Clustering
Moderate clustering High clustering Very high clustering
Coefficient
Moderate, depends on
Yes, strong small-world
Small-World Property preferential Yes, by design
property
attachment
Yes, hubs exist
Hubs (Highly Yes, hubs naturally No, hubs do not
(influencers,
Connected Nodes) emerge naturally form
celebrities)
Grows with
Starts with rewiring of Grows dynamically
Connectivity Growth preferential
regular network with new users
attachment
Models scale-free Models small-world Captures both scale-
Applicability to Social
nature well, but low behavior well, but lacks free and small-world
Networks
clustering power-law distribution properties
END