0% found this document useful (0 votes)
25 views28 pages

Unit Ii

The document discusses the application of graph theory and adjacency matrices in social network analysis, emphasizing the representation of users as nodes and their interactions as edges. It outlines various types of graphs, their properties, and how they are utilized in platforms like Facebook and Twitter for modeling relationships, community detection, and recommendation systems. Additionally, it covers concepts such as path connectivity, breadth-first search algorithms, and the importance of graphs in optimizing social networking features.

Uploaded by

Saman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views28 pages

Unit Ii

The document discusses the application of graph theory and adjacency matrices in social network analysis, emphasizing the representation of users as nodes and their interactions as edges. It outlines various types of graphs, their properties, and how they are utilized in platforms like Facebook and Twitter for modeling relationships, community detection, and recommendation systems. Additionally, it covers concepts such as path connectivity, breadth-first search algorithms, and the importance of graphs in optimizing social networking features.

Uploaded by

Saman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Unit –II

Graphs and Matrices

Graph and Adjacency matrix:

“Social network analysis is the mapping and measuring of relationships and flows between
people, groups, organisations, computers or other information/knowledge processing entities”
Social Network Analysis (SNA) is a method for visualizing our people and connection power,
leading us to identify how we can best interact to share knowledge.

What is a relationship? In an interpersonal context, it can be friendship, influence, affection,


trust, dislike, conflict, or many other things.

What is graph theory: Graph is a mathematical structure that allows you to represent everyday
problems in a graphic way. In addition, network theory allows you to represent only one type of
relationship (simple representation), but it also allows you to represent more than one type (in that
case, it would be called multiple).
 A graph is a mathematical abstraction that consists of two sets: a set of nodes (also called
vertices/actors/user) and a set of edges (also called links/relationships/connection).
 Nodes represent the entities in the network, such as users, posts, hashtags, or products,
Individuals or organizations.
 Edges represent the relationships or interactions between the nodes, such as following, liking,
commenting, or buying.
 A graph can be directed or undirected, depending on whether the edges have a direction or not. A
graph can also be weighted or unweighted, depending on whether the edges have a numerical
value or not.
A graph-generally denoted G (V, E) or G= (V, E) - consists of the set of vertices V unitedly with a set of
edges E. The number of vertices in a graph is normally denoted n while the number of edges is normally
denoted m.

Applying graph theory to social networks:


1.Graph Theory in Facebook: Majority are familiar with Facebook these days. You can click ‘like’ if
you find something likeable, ‘tag’ your friends in various ‘posts’, put comments in posts and most
importantly befriend someone whom you know and also someone whom you don’t know! The concept
of graph theory is used in Facebook with each person as nodes and every like, share, comment,
tag as edges.
2.Graph Theory in Twitter: Here the persons are considered as nodes and if one person follows
another then that is considered as the edge between the two.

3.Let’s think about the commercial strategy that any telecommunication company seeking to
know the composition of the links would carry out. You would be interested in knowing which
people we usually talk to and thus adapt your commercial strategy to offer personalized offers
and/or
rates.
In addition to this, applying the networks to social networks can work to adapt the products to
the real needs, making them appear at the right time.
When we talk about networks applied to social networks the most common is that they are used
to “detect communities”. Thanks to the algorithms we can see characteristics, attributes and
relationships that match within a group.
When we analyze the subnetworks, we can see the vertices that are most related to each other,
and also how they relate to the rest of the vertices.
If we look at the graph above, we can see that three different communities have been detected, in
which we can assume that all the members of the same community have characteristics or
attributes.

Use of Graphs in Social Networking

Graphs play a crucial role in modeling and analyzing social networks. In a social network,
people (users) are represented as nodes (vertices), and relationships (friendships, follows,
interactions) are represented as edges (links).

 Adjacency matrices:
Every network can be expressed mathematically in the form of an adjacency matrix. In these
matrices the rows and columns are assigned to the nodes in the network and the presence of an
edge is symbolised by a numerical value. By using the matrix representation of the network, we
can calculate network properties such as degree, and other centralities by applying basic concepts
from linear algebra.
An adjacency matrix is a two-dimensional array that stores the presence or absence (or weight)
of an edge between every pair of nodes.
If there are n nodes, the matrix will be n × n in size.
Types:

1. Undirected Graph (Unweighted)

 Connections are mutual (if A is connected to B, then B is connected to A).


 Matrix is symmetric.
 Uses 0 and 1 only.
2. Directed Graph (Unweighted)

 Direction matters: xij = 1 means i → j (but not necessarily j → i).


 One-way connection
 Matrix can be asymmetric.

3. Undirected Graph (Weighted)

 Mutual connections, with weights instead of just 0/1.


 Matrix is symmetric.
 Example weights are hypothetical.

4. Directed Graph (Weighted)

 Direction matters and weights are used.


A network with undirected, unweighted edges will be represented by a symmetric matrix
containing only the values 1 and 0 to represent the presence and absence of connections, respectively.
Directed and weighted networks can make use of different numerical values in the matrix to
express these more complex relationships.
Examples:

Andy Bil Car Dan Ele Fra Gar


Andy 1 0 1 0 0 1
Bill 1 1 0 1 0 0
Carol 1 1 1 1 0 0
Dan 1 1 1 0 0 0
Elena 0 0 0 0 1 0
Frank 0 0 0 0 1 0
Garth 1 1 0 0 0 0

If the matrix as a whole is called X, then the contents of any given cell are denoted x ij. For example, in the matrix
above, xij = 1, because Andy likes Bill. Note that this matrix is not quite symmetric (xij not always equal to xji).
Anything we can represent as a graph, we can also represent as a matrix. For example, if it is a valued graph, then
the matrix contains the values instead of 0s and 1s.

By convention, we normally record the data so that the row person "does it to" the column person. For example, if
the relation is "gives advice to", then xij = 1 means that person i gives advice to person j, rather than the other way
around. However, if the data not entered that way and we wish it to be so, we can simply transpose the matrix. The
transpose of a matrix X is denoted X'. The transpose simply interchanges the rows with the columns.

Applications of Graphs in Social Networks

1. Friendship and Connection Modeling


o Social media platforms like Facebook and LinkedIn use undirected graphs,
where an edge represents a mutual friendship/connection.
o Twitter, Instagram, and TikTok, Youtube use directed graphs, where an edge
represents a "follow" (one user follows another).
2. Recommendation Systems (Friend Suggestions)
o Platforms like Facebook and LinkedIn use Graph-based algorithms (PageRank,
HITS, Personalized Recommendations) to suggest new friends or connections.
o Example: "People You May Know" feature.
3. Community Detection
o Clustering algorithms identify groups or communities where users are more
closely connected.
o Used for detecting interest-based groups on Facebook, Reddit, or Discord.
4. Influencer and Trend Analysis
o Platforms like Twitter and Instagram use graph centrality measures
(PageRank, Betweenness Centrality) to identify influential users.
o Helps in marketing and advertising campaigns.
5. Content and Advertisement Targeting
o Social networks analyze user interaction graphs to show personalized ads.
o Example: Facebook Ads use graph data to target users based on their connections
and interactions.
6. Viral Marketing and Information Spread
o Graphs help model how information spreads in a network.
o Companies use diffusion models (like the Independent Cascade Model) to
predict viral trends.
7. Detecting Fake Accounts and Spam
o Graph analysis helps detect bot networks or fake accounts by identifying
suspiciously dense subgraphs.
o Example: Twitter uses graph-based anomaly detection to flag spam bots.
8. Shortest Path Analysis (Navigation & Messaging)
o Social media platforms use shortest path algorithms (like Dijkstra's or BFS) to
find the shortest connection between two users.
o Used in network traversal (e.g., LinkedIn’s "You are X connections away").

Graphs are fundamental notion to building, analyzing, and optimizing social networks.
Whether it’s recommending friends, detecting influencers, or securing platforms, graph-
based techniques drive modern social networking features.
 Path and Connectivity:

Path refers to a sequence of connected vertices in a graph, essentially a route from one vertex to
another along edges, while connectivity describes whether there exists a path between every pair
of vertices within a graph, meaning a connected graph has a path between any two vertices
within it; if not, it's considered disconnected.

The most general form of connection between two actors in a graph is called a walk. A walk is a
sequence of actors and relations that begins and ends with actors.

Path: Path in a network is the sequence of edges leading from one node to another. The number of edges
between two nodes on a given path is considered distance. Related, the length of the shortest path
between two nodes is commonly called the “geodesic distance.”

Perhaps the most useful definition of a connection between two actors (or between an actor and itself) is a
path. A path is a walk in which each actor (and therefore each relation) in the graph may be used at most
once. The single exception to this is a closed path, which begins and ends with the same actor. All paths
are trails and walks, but not all walks and all trails are paths. In Figure, there are a limited number of
paths connecting A and C: {A, B, C}; {A, B, D, C}; {A, B, E, D, C}.

The length of a path is the number of relations in it. The length of a shortest path between two actors is
the geodesic distance between them. Thus, the geodesic distance between A and C in the graph of Figure
is 2.

Connectivity:

Connectivity is a property of a network (not of its individual actors) that extends the concept of
adjacency. If it is possible to establish a path from any actor to any other actor of a network (e.g. every
actor is reachable by every other one), the network is said to be connected; otherwise the network is
disconnected. The network represented in Figure is connected.

Adjacency refers the direct connection between two actors, it may be directed or un-directed. If
there are more path ways connect two actors then there is a high connectivity to reach the actors
from different paths.

Table 2.1.5 Point connectivity of any society information exchange

Society A Society B Society C Society D

Society A 3 5 1 8
Society B 4 6 2 7

Society C 1 2 0 1

Society D 2 5 3 4

From the table the Actor Society C is having weak connectivity. If any Actor refuses to send
message to one then it is difficult for Actor Society C to get most of the information.

Network Distance: The distance between links/ties. For example a link means Dev is the friend
Sharma. Sastri is the friend of friend of Dev which is distance of 2. Rao is the friend of friend of
friend of Dev which is having the network distance of 3.

 Breath First Search (BFS):

Breadth First Search is referred to as BFS. A vertex-based method is used to determine the
shortest path in a graph.

In the BFS traversal approach, every node on a level is first investigated before moving on to the
next level.

The BFS traversal uses a queue data structure. The idea of backtracking is not used in BFS.

BFS identifies the route from the source to the destination vertex that travels the fewest number
of edges.

For vertices that need to be found near the source vertex, BFS traversal is the best option.

The Breath First Search algorithm traverses the graph level by level. Let's express it
straightforwardly. As active users of LinkedIn, you may have noticed the presence of 1st, 2nd,
and 3rd connections. The definitions are presented below.

 1st connection: People who directly connect with you.

 2nd connection: People who are directly connected to your 1st connection but are
not directly connected to you.

 3rd connection: People who are directly linked to your 2nd connection but do not
have a direct connection with you.

As you might expect, when employing BFS to expand a network, the process involves traversing
all 1st connections before moving on to the 2nd connections and subsequent levels. This is an
accurate representation of how Breadth-First Search operates within a graph. The path will
traverse the graph in a layer-by-layer manner, rather than delving deeply into specific branches.
This can be further illustrated with a simple example. Suppose the network structure represents a
group of individuals, where connections indicate friendship. Let us commence with person A.

Figure 1: An example of a social network

It is evident that individuals A and B, as well as A and C, form friendships, whereas A and E,
and B and C, do not. As you can envision, the sequence will encompass A, followed by B, C,
and D, and ultimately, E, F, G, H, and I. This will be demonstrated through both visual
representations and code.

Here, the visited nodes will be marked as red, the current node (or equivalently, the first node in
the queue) will be marked as green, nodes in the queue but not the current node and not in the
visited set will be marked as purple, and unvisited nodes will be marked as blue in the tree, while
all nodes in the Visited and Queue will remain blue as usual. Initially, the items will be placed in
the Queue and then transferred to Visited. Subsequently, any neighbors that are not
in Visited will be added to the queue. This process will be repeated multiple times until all nodes
(people) have been checked, or until there are no remaining items in the queue. The
term Queue is utilized in reference to a data structure known as a queue, which adheres to the
principles of first in, first out (FIFO). Furthermore, these images depict the conclusion of the
respective procedures.
Step 1: Checking over A.
Figure 2: Checking over

Step 2: Moving A into the Visited and subsequently adding its neighboring nodes, B, C, and D,
to the Queue. It is important to note that B is marked as green, while C and D are marked as
purple, reflecting the current focus on node B.

Figure 3: Moving

Step 3: B is moved into the Visited, and then the neighbors of B, E, and F, are placed in
qthe Queue. While A is a neighbor of C, it cannot be added to the Queue as it is already in
the Visited.
Figure 4: Moving

Step 4: C is moved into the Visited category, and then the neighbor of C, which is G, is placed
in the Queue. Once more, the addition of A to the Queue is not permitted.

Figure 5: Moving

Step 5: D is moved into the Visited, and subsequently, the neighbors of D that are not already in
the Visited, namely H and I, are placed in the Queue.
Figure 6: Moving

Step 6: We are at E now. It is important to note that although E has a neighbor B, once it is
marked as Visited, B can no longer be added to the Queue. Consequently, E can be moved into
the Visited category.

Figure 7: Moving
Step 7: Moving F into the Visited category in a similar manner. The neighbor of F, which is B,
cannot be added to the Queue as it has already been visited.

Figure 8: Moving

Step 8: Moving G into the Visited.

Figure 9: Moving
Step 9: Moving H into the Visited.

Figure 10: Moving

Step 10: Moving I into the Visited.

Figure 11: Moving

In this instance, all nodes are depicted in red, indicating that the entire graph has been traversed
and the process has been finalized. The Visited path encompasses the
sequence A, B, C, D, E, F, G, H, and I, progressing from the upper level to the middle level, and
ultimately to the lower level.
Example, In Network broadcasting involves the transmission of information to receiving devices.
In general, a device will transmit information to its nearest recipient devices. They are referred to
as first-degree friends. The individuals classified as 1st-degree friends will disseminate the
information to their acquaintances, thereby functioning as 2nd-degree friends. This process will
continue until all recipients within the network have received the message.

This process progresses from one level to the next, mirroring the operational principles of
Breadth-First Search (BFS).

How is it different from DFS:

Slower than DFS is BFS.

Because it necessitates first studying every surrounding node, it is not appropriate for the
decision tree.

Because it uses more memory than DFS, it is not memory efficient.

Facebook, like many social networks, likely employs variations of graph algorithms for tasks
such as finding the minimum distance between users or checking connectivity. While the specific
algorithms used by Facebook are proprietary and not publicly disclosed, we can discuss the
general approaches:

1. Breadth-First Search (BFS): This algorithm is typically used for finding the
shortest path in unweighted graphs. If Facebook treats the social graph as
unweighted (where each connection has the same importance), BFS would be a
suitable choice to find the minimum distance between two users or to check if they
are connected.
2. Depth-First Search (DFS): While DFS can be used to explore the graph, it is not
optimal for finding the shortest path in unweighted graphs. It may be used in certain
scenarios, but it is less efficient than BFS for connectivity checks.
In practice, Facebook might use a combination of these algorithms along with optimizations and
heuristics, especially given the scale and complexity of their social graph.

 Network datasets: Network datasets are collections of data that represent


relationships or connections between entities (often called nodes) through links (called
ties or edges). These datasets are widely used in social network analysis, organizational
studies, and various other fields.
Fig: A simple network of five people. Two people are connected by an edge, if they are friends.

Here's a breakdown of the key components involved in network datasets and how they relate to
each other:

1. Nodes (Vertices/users/actors)

● Nodes represent individual entities or objects in a network.


● Examples: In a social network, nodes could represent people; in a citation network, they
could be academic papers; in a transportation network, they might represent stations or
airports.
● Each node can have its own set of attributes such as age, gender, location, or other
characteristics depending on the context of the network.

2. Ties (Edges/relationships/connections)

● Ties represent the relationships or interactions between nodes.


● Examples: In a social network, ties might represent friendships, messages, or
collaborations between individuals. In a transportation network, ties represent the routes
or connections between nodes (e.g., flights between airports).
● Types:
o Directed: In directed networks, ties have a direction, indicating the flow from one
node to another (e.g., Twitter follows where one user follows another).
o Undirected: In undirected networks, ties are bidirectional and don’t have a
specific direction (e.g., mutual friendships).
o Weighted: Ties have a numerical value (weight) representing the strength or
frequency of the relationship. Example: Number of messages exchanged
between two users.
o Unweighted: Ties are either present or absent, with no strength or frequency
value. Example: Two people are friends or not, no additional detail.
o
Ties Can Have Attributes:
 Strength: Frequency or intensity of interaction (e.g., talking daily vs. monthly).
 Weight: Numeric value (e.g., 50 messages exchanged).
 Type: Mode of interaction (e.g., online vs. offline).

3. Applications of Network Datasets

● Social Media: Analyzing relationships between users to understand influence,


information spread, and communities.
● Supply Chains: Understanding how goods and services flow through interconnected
companies or markets.
● Recommendation Systems: Collaborative filtering techniques often rely on network
datasets to suggest products based on similar users' preferences.

4. Analyzing Network Datasets

● Graph Theory: The mathematical study of graphs is the foundation for understanding
the structure of networks.
● Community Detection: Identifying clusters of nodes that are more densely connected to
each other than to the rest of the network.
● Centrality Measures: Identifying the most important or influential nodes in the network.
● Network Dynamics: Studying how the network changes over time, such as in response
to changes in connections or external factors.

Network datasets offer rich insights into complex systems, and analyzing them can reveal
important patterns and behaviors that are not easily identifiable through other means.

 5. Influencers/Centrality:
● Definition: Influencers are individuals or nodes within the network that have significant
impact or control over the behavior of other nodes.

● Actors those who are connected with more number of actors in the network has the maximum
centrality. This type of actors are having advantages as acting like bridges, third parties, getting
more information, transferring more information and having more alternative paths.
● In social networks, influencers are often key figures who drive trends, spread
information, or affect the decisions of others. They can be identified through measures
like centrality (e.g., betweenness centrality, degree centrality, eigenvector centrality).
Key Measures of Influence:

1. Degree Centrality:
o Measures the number of direct connections a node has.
o A higher degree centrality means the node is well-connected and can influence many
others.
o Example: A popular Instagram user with thousands of direct followers has high degree
centrality.
2. Betweenness Centrality:
o Measures how often a node acts as a bridge along the shortest path between other nodes.
o Nodes with high betweenness centrality play a key role in controlling information flow
in the network.
o Example: A journalist on Twitter who connects different communities by sharing
information between them.
3. Eigenvector Centrality:
o Measures a node’s influence based on the quality and number of its connections.
o A node is influential if it is connected to other highly influential nodes.
o Example: A celebrity followed by other famous influencers has a high eigenvector
centrality.
4. Closeness Centrality:
o Measures how close a node is to all other nodes in the network.
o Nodes with high closeness centrality can access and spread information quickly.
o Example: A company’s CEO who is directly connected to all department heads, enabling
rapid decision-making.

o
 Making Connections

Making connections is the foundational process of identifying and representing relationships between
individuals, groups, or entities within a network. This process translates real-world social interactions into
formats amenable to analysis using network science methods, serving as the critical first step in
understanding the complex dynamics of social networks.

What Making Connections Entails


1. Data Collection:
The initial step involves gathering raw data that represents social interactions. Sources include:
o Social media platforms (friendships, follows, mentions, shares)
o Communication logs (emails, messages)
o Collaboration records (co-authorships, project teams)
o Surveys or questionnaires
2. Defining Relationships:
Clearly defining what constitutes a "connection" is crucial. This definition can vary depending on
the context:
o "Friendship" on Facebook
o "Following" on Twitter
o "Co-worker" in a corporate network
o "Interaction" based on frequency of comments
3. Representing Connections:
Once connections are defined, they need to be represented in a way that can be analyzed. This is
typically done using:
o Graphs: Mathematical structures consisting of nodes (entities) and edges (connections).
o Adjacency Matrices: Tables that represent which nodes are connected.
o Edge Lists: Simple lists of pairs of connected nodes.
4. Quantifying Connections:
Sometimes, connections are not simply present or absent, but have varying degrees of strength.
This leads to the use of:
o Weighted Networks: Where edges are assigned numerical values representing the
strength of the connection.
o These weights can represent frequency of interaction, sentiment, or other relevant factors.

Why Making Connections Is Important


 Foundation for Analysis:
Without accurately representing connections, all subsequent analyses would be flawed.
 Revealing Network Structure:
Connections define the overall structure of the network, including:
o Clusters and communities
o Central nodes and influential individuals
o Paths and flows of information
 Understanding Social Dynamics:
Connections provide insights into how individuals and groups interact, collaborate, and influence
each other.
 Enabling Predictions:
By analyzing connection patterns, we can make predictions about future behavior, such as:
o The spread of information
o The formation of new relationships
o The evolution of trends

 Affiliation in Social Networks:

Affiliation refers to an individual’s connection to social groups, organizations, or communities. It


plays a crucial role in shaping relationships within a network.

Key Aspects of Affiliation:

● Homophily: People tend to associate with others who share similar interests, beliefs, or
backgrounds (e.g., political views, profession).
● Group Membership: Individuals form or join groups based on shared goals, values, or
interests (e.g., online communities, religious groups).
● Social Capital: Networks provide resources, information, and support, helping
individuals gain advantages in social and professional life.
● Structural Balance: Social relationships often exhibit balance, where individuals adjust
connections to maintain harmony in affiliations.

Examples of Affiliation:

● Workplace collaborations and professional networks.


● Academic affiliations and research collaborations.
● Online communities based on shared interests (e.g., gaming, fan clubs).

Identity in Social Networks:

Identity is the way individuals define themselves and are recognized within a social network. It
influences how people interact, the groups they belong to, and their perceived social roles.

Key Aspects of Identity:

● Social Identity Theory: People categorize themselves into in-groups (similar


individuals) and out-groups (dissimilar individuals), influencing their interactions.
● Role-Based Identity: A person’s identity changes depending on their role in different
social contexts (e.g., student, employee, activist).
● Online vs. Offline Identity: Digital platforms allow users to create and manage multiple
identities (e.g., professional LinkedIn vs. casual Instagram presence).
● Identity Performance: Individuals modify their behavior and self-presentation based on
the social context (e.g., formal in work settings, relaxed with friends).

Examples of Identity Influence in Networks:

● Political and ideological group formation on social media.


● Identity-based networking in professional fields (e.g., women in STEM groups).
 link analysis:
Link analysis, sometimes called ‘graph visualization’ or ‘network visualization’, is the process of
visually presenting networks of connected entities as nodes and links.

Typically, the nodes represent specific data points, and the links represent the connections
between them.

If you need to understand connections in your data, there can be huge advantages to looking
beyond your ‘flat’ data model with powerful link analysis tools.

How do you do link analysis?

Link analysis requires a step-by-step approach, which will vary depending on individual
technology stacks, and the maturity of your data management processes. A generic link analysis
process could be:

Step 1 – data collection – from various structured and unstructured sources. A benefit of link
analysis is the ability to connect information from disparate silos, for a more unified view of
information.

Step 2 – data cleansing and entity extraction – this involves normalizing and deduplicating
information, and ensuring nodes and links have the right attributes and identifiers for your
analysis.

Step 3 – building the data model – this is the translation of a conceptual model, based on the
information you want to uncover, into a logical model, based on the information you have. Read
more about data modeling.

Step 4 – visualization and analysis – this is when you see and interact with nodes and links in a
link analysis chart. You can enhance your visual analysis with various techniques and
algorithms, including centrality analysis, automated layouts, timeline visualization and node
grouping.

how to perform a link analysis process


Why do people use link analysis?

The world is densely connected, and link analysis will help you understand those connections.

Unlike other forms of data analysis, link analysis focuses as much on the connections between
data as on the data points themselves. That makes it possible to uncover hidden relationships and
identify patterns, like dependencies or anomalies, that are not visible in aggregated analysis.

It’s intuitive: Exploring networks as node-link structures instantly makes sense, even to people
who’ve never worked with connected data before.

It’s fast: Our brains are great at spotting patterns, but only when the info is in a tangible format.
Link analysis helps you identify trends and outliers quickly.

It’s holistic: Link analysis makes it possible to connect data points from diverse sources,
breaking down silos for a holistic view of the full data context.

It’s scalable: Link analysis lets you simplify complexity, see context and understand detail.
With one chart, you can get an overview or dive into specific connections.

It’s insightful: Through interactive data analysis, you gain deeper knowledge and understand
context. That’s hard to achieve with a static, aggregated visualization.

Link analysis use cases:


Connected data is all around us, and link analysis makes it possible to understand those connections.

Cybersecurity: Find patterns, anomalies and outliers in your complex threat landscape so you can
predict, detect and prevent cyber threats.

Financial crime & fraud: Analyze connections between people, places, transactions and events to
manage compliance and defeat fraudsters.

Security & Intelligence: Visualize threat information from communications data and intelligence sources
to beat organized crime and terrorism.
Law enforcement: Reveal connections, understand patterns, predict behaviors and gain insight that can
prove, solve or prevent crime.

Blockchain & crypto: Simplify blocks of transactions, track the movement of funds and verify crypto
users to counteract criminal activity.

Knowledge graphs: Make large-scale graphs accessible, understandable and intuitive through visual data
discovery and interactive analysis.

Pharmaceuticals: Use data visualization to facilitate breakthroughs in scientific understanding, cost


efficiences and competitive advantage.

Infrastructure: Map complex systems, networks and pipelines to identify critical paths, prevent
bottlenecks and uncover vulnerabilities.

Supply chain: Power end-to-end visibility, efficiency and traceability in supply networks to optimize
performance at every stage.

 Weighted Networks:
A weighted network is a graph where the edges have associated weights that represent the
strength, capacity, cost, or some other quantitative value of the connection between nodes. These
networks extend the concept of unweighted graphs, making them more suitable for real-world
scenarios where relationships are not uniform in strength.

Key Features of Weighted Networks:

1. Nodes (Vertices):
o Represent entities such as individuals, objects, or locations.
2. Edges (Links):
o Represent connections or relationships between nodes.
3. Weights:
o Quantitative values assigned to edges (e.g., frequency of interaction, distance, trust level,
flow capacity, monetary value). Weights can be positive or negative, representing
beneficial or detrimental relationships, respectively

Examples of Weighted Networks:


1. Social Networks:
o Weight represents the frequency of communication (e.g., the number of messages
exchanged between two individuals).
2. Transportation Networks:
o Weight represents travel time, distance, or cost between locations.
3. Biological Networks:
o Weight represents interaction strength (e.g., protein binding affinity in protein-
protein interaction networks).
4. Financial Networks:
o Weight represents transaction amounts or trade volume between entities.

Key Metrics in Weighted Networks:

 Strength (Weighted Degree): The sum of edge weights connected to a node. This metric
quantifies the total influence, activity, or capacity associated with a node. For example, in a
social network, a user's strength might be the total number of messages sent and received.
o In-Strength / Out-Strength: For directed weighted networks, representing the sum of
incoming and outgoing edge weights, respectively.
 Weighted Clustering Coefficient: Measures how tightly a node’s neighbors are connected,
taking weights into account. It quantifies the "cliquishness" of a node's local neighborhood,
giving more importance to stronger connections.
 Weighted Shortest Path: Pathfinding algorithms (e.g., Dijkstra’s algorithm, Bellman-Ford
algorithm) consider edge weights to find the "shortest" or most efficient path between two
nodes, where "shortest" could mean least cost, shortest time, or highest capacity.
 Weighted Betweenness Centrality: Measures the importance of a node in terms of weighted
shortest paths passing through it. A node with high weighted betweenness centrality acts as a
crucial "bridge" or "bottleneck" in the flow of information or resources.
 Weighted Closeness Centrality: Measures how "close" a node is to all other nodes in the
network, considering the sum of weighted shortest paths. Nodes with high weighted closeness
centrality can quickly reach other nodes.
 Weighted PageRank/Eigenvector Centrality: Extensions of centrality measures for unweighted
networks that incorporate edge weights to assess the influence or importance of nodes based
on the strength of their connections and the importance of their neighbors.

Applications of Weighted Networks:

 Recommendation Systems: Recommending products, movies, or content based on user-item


interaction weights (e.g., purchase frequency, viewing duration, rating scores). Stronger weights
indicate higher preference or engagement.
 Optimizing Transport or Logistics Networks: Finding optimal routes for delivery, managing
traffic flow, or designing efficient public transportation systems by considering travel times,
capacities, and costs as weights.
 Studying Patterns in Biological Systems: Analyzing brain networks (e.g., using fMRI data where
edge weights represent functional connectivity between brain regions), protein interaction
networks, or metabolic pathways to understand biological functions and disease mechanisms.
 Risk Assessment in Financial Systems: Identifying systemic risk by analyzing transaction
volumes and financial dependencies between institutions.
 Epidemiology: Modeling disease spread where weights represent the probability or intensity of
transmission between individuals.
 Supply Chain Management: Optimizing flow of goods and identifying critical bottlenecks by
considering capacity and cost weights.
 Information Flow and Influence Analysis: Understanding how information propagates in social
media or communication networks, giving more weight to stronger ties or frequent interactions.
 Community Detection: Identifying communities or clusters within networks, where weights can
influence the formation of groups (e.g., stronger ties within a community than between
communities).
 Resilience and Vulnerability Analysis: Assessing the impact of removing weighted edges or
nodes on the network's overall functionality and identifying critical components.

Hypergraphs:
A hypergraph generalizes the concept of a graph by allowing edges (called hyperedges) to
connect more than two nodes simultaneously. This is particularly useful for representing multi-
way relationships.

Structure of a Hypergraph:

1. Nodes (Vertices):
o Entities or elements in the system.
2. Hyperedges:
o Subsets of nodes that define multi-way relationships. Each hyperedge can connect
an arbitrary number of nodes.

Key Metrics in Hypergraphs:

1. Degree:
o Node degree: The number of hyperedges a node belongs to.
o Hyperedge size: The number of nodes in a hyperedge.
2. Clustering in Hypergraphs:
o Measures the tendency of nodes to form clusters in multi-way interactions.
3. Centrality:
o Captures the importance of a node or hyperedge in terms of the overall structure.

 Extend hypergraphs by assigning weights to hyperedges, representing the strength or


importance of the multi-way relationship.

Applications of Hypergraphs in Social Media Analytics

Social Media Network Analysis: By modeling interactions as hypergraphs, we can better


analyze complex group dynamics. For example, we can study how groups of users interact with
each other in group chats or on community pages.
Trending Topics and Hashtags: Hypergraphs help identify how users participate in multi-way
interactions like trending topics or hashtag campaigns, which involve many users interacting
around a single theme.

Content Recommendation: Hypergraphs are useful in recommender systems by analyzing


multi-way relationships (e.g., users, items, and ratings) to recommend relevant content or
products to users based on their group interactions, activities, or interests.

Influence and Virality Analysis: By looking at how information spreads through multi-way
interactions, hypergraphs can help us identify which users or groups are central in spreading
viral content or influencing the larger network.

Collaboration Networks: In social media, hyperedges could represent a group of users


collaborating on a project or post. For instance, a post that involves several users, such as a
group photo or collaborative content, can be represented as a single hyperedge linking all the
users together.

Group Chats (WhatsApp, Slack, etc.): In a group chat, a hyperedge can represent the group
itself. Rather than connecting individual pairs of people (as in a traditional graph), a hyperedge
connects all the users in the group simultaneously, allowing us to analyze the dynamics of
interactions within the group.

Event Participation: If multiple users attend an event or participate in a hashtag trend, we can
represent these multi-way relationships using a hyperedge. Instead of connecting pairs of users
who liked a post, the hyperedge would capture all users who attended or participated in the
event.

Random graph:
Random graph models are fundamental tools in social media analytics and data analysis,
providing a mathematical framework to understand and simulate the structure and evolution of
social networks. They help researchers gain insights into phenomena like information spread,
community formation, and influence. or

A random graph is a type of graph where edges (connections) between nodes (users or entities)
are formed randomly based on some probability. It is commonly used in social media analytics
to model and study unpredictable patterns of user connections and interactions.

What are Random Graph Models?

In the context of social media, a "graph" represents the network, where:

 Nodes (vertices) typically represent users, profiles, or pieces of content (e.g., posts, tweets).
 Edges (links) represent connections or interactions between these nodes (e.g., friendships,
followers, retweets, likes, mentions).
Random graph models are generative models, meaning they provide rules or probabilities for
creating connections between nodes, allowing us to understand how different network structures
emerge. The "randomness" comes from the probabilistic nature of these connections.

Key Random Graph Models and Their Relevance to Social Media:

1. Erdos-Renyi (ER) Model:


o This is the simplest random graph model. It assumes that every possible edge between
any two nodes exists with a fixed, independent probability p.
o While historically important for laying the groundwork of random graph theory, the ER
model often fails to capture key characteristics of real-world social networks. Its degree
distribution (number of connections a node has) tends to follow a Poisson distribution,
which means most nodes have roughly the same number of connections. This is unlike
social media, where a few users have many connections (hubs) and most have few.
o Limitations: Low clustering coefficient (lack of "cliquishness" – friends of your friends
aren't necessarily friends with each other) and does not account for the formation of hubs.
2. Watts-Strogatz (WS) Model (Small-World Networks):
o This model attempts to bridge the gap between perfectly regular networks (where every
node has the same number of neighbors) and completely random networks (like ER). It
starts with a regular lattice and then "rewires" a small fraction of its edges randomly.
o It successfully captures two crucial properties observed in many social networks:
 Small-world property: Short average path lengths (meaning you can connect to
anyone in the network through a surprisingly small number of steps, akin to "six
degrees of separation").
 High clustering coefficient: Friends of a user are often also friends with each
other, creating local clusters or "cliques."
o Limitations: While better than ER, it still doesn't fully capture the highly skewed degree
distributions (power-law distribution) seen in many real-world social networks, where
some nodes are exceptionally well-connected "hubs."
3. Barabasi-Albert (BA) Model (Scale-Free Networks):
o This model explains the emergence of "hubs" and the "rich get richer" phenomenon
through two key mechanisms:
 Growth: The network continuously grows by adding new nodes.
 Preferential Attachment: New nodes are more likely to connect to existing
nodes that already have a large number of connections (i.e., popular nodes attract
more connections).
o This model is highly relevant to social media. It explains why platforms like Twitter or
Instagram have a few highly influential users (celebrities, public figures) with millions of
followers, while most users have only a few. The degree distribution of BA networks
follows a power-law, which is a common characteristic of many real-world complex
networks.
o Advantages: Generates networks with realistic degree distributions (scale-free property)
and accounts for network growth.
o Limitations: It often fails to produce the high clustering coefficients seen in real social
networks as effectively as the Watts-Strogatz model.
4. Stochastic Block Model (SBM):
o Unlike the previous models that focus on global properties, SBM is a generative model
for graphs with community structure. It partitions nodes into "blocks" or communities,
and the probability of an edge existing between two nodes depends on the communities
they belong to. For example, connections are more probable within communities than
between them.
o This model is excellent for identifying and analyzing communities or groups within
social media networks (e.g., interest groups, political factions, fan communities). It can
be used for community detection, understanding user segmentation, and analyzing
interaction patterns within and between groups.
o Advantages: Explicitly models community structure, which is a pervasive feature of
social networks.

Applications of Random Graph Models in Social Media Analytics:


 Understanding Network Structure: Analyzing properties like degree distribution, clustering,
and path length to characterize the network's topology.
 Community Detection: Identifying groups of users with denser connections among themselves
than with others.
 Influence and Contagion Modeling: Simulating how information, trends, or even
misinformation spreads through the network. This can inform strategies for viral marketing,
public health campaigns, or combating fake news.
 Predicting Network Evolution: Using models to anticipate how a social media platform might
grow and how new connections will form.
 Anomaly Detection: Identifying unusual patterns of connection that might indicate bots, spam
accounts, or malicious activity.
 Targeted Advertising and Recommendations: Understanding user connections can help
recommend new friends, content, or products.
 Robustness and Vulnerability Analysis: Studying how the network's connectivity is affected by
the removal of nodes (users) or edges (connections), which is crucial for network resilience.

Challenges and Limitations:

Despite their utility, random graph models have limitations in social media analysis:

 Simplification of Reality: Real social networks are far more complex than most models assume,
incorporating factors like temporal dynamics, weighted edges (strength of relationships), multiple
types of relationships, and non-random human behavior.
 Static vs. Dynamic: Many basic random graph models are static, while social media networks
are constantly evolving (new users join, old ones leave, connections are made and broken).
Dynamic network models are needed to capture this fluidity.
 Assumptions: Each model relies on specific assumptions that may not perfectly hold true for all
social media platforms or types of interactions.
 Data Size and Complexity: Analyzing real-world social media graphs, which can have billions
of nodes and edges, requires significant computational resources.
 Bias in Data: Data collected from social media platforms can be biased, affecting the accuracy of
the models.

Random graph models provide a powerful theoretical foundation for understanding social media
networks. While simpler models offer fundamental insights, more advanced and complex models
are continually being developed to better capture the intricate realities of online social
interactions.
Network evolution?

You might also like