Unit Ii
Unit Ii
“Social network analysis is the mapping and measuring of relationships and flows between
people, groups, organisations, computers or other information/knowledge processing entities”
Social Network Analysis (SNA) is a method for visualizing our people and connection power,
leading us to identify how we can best interact to share knowledge.
What is graph theory: Graph is a mathematical structure that allows you to represent everyday
problems in a graphic way. In addition, network theory allows you to represent only one type of
relationship (simple representation), but it also allows you to represent more than one type (in that
case, it would be called multiple).
A graph is a mathematical abstraction that consists of two sets: a set of nodes (also called
vertices/actors/user) and a set of edges (also called links/relationships/connection).
Nodes represent the entities in the network, such as users, posts, hashtags, or products,
Individuals or organizations.
Edges represent the relationships or interactions between the nodes, such as following, liking,
commenting, or buying.
A graph can be directed or undirected, depending on whether the edges have a direction or not. A
graph can also be weighted or unweighted, depending on whether the edges have a numerical
value or not.
A graph-generally denoted G (V, E) or G= (V, E) - consists of the set of vertices V unitedly with a set of
edges E. The number of vertices in a graph is normally denoted n while the number of edges is normally
denoted m.
3.Let’s think about the commercial strategy that any telecommunication company seeking to
know the composition of the links would carry out. You would be interested in knowing which
people we usually talk to and thus adapt your commercial strategy to offer personalized offers
and/or
rates.
In addition to this, applying the networks to social networks can work to adapt the products to
the real needs, making them appear at the right time.
When we talk about networks applied to social networks the most common is that they are used
to “detect communities”. Thanks to the algorithms we can see characteristics, attributes and
relationships that match within a group.
When we analyze the subnetworks, we can see the vertices that are most related to each other,
and also how they relate to the rest of the vertices.
If we look at the graph above, we can see that three different communities have been detected, in
which we can assume that all the members of the same community have characteristics or
attributes.
Graphs play a crucial role in modeling and analyzing social networks. In a social network,
people (users) are represented as nodes (vertices), and relationships (friendships, follows,
interactions) are represented as edges (links).
Adjacency matrices:
Every network can be expressed mathematically in the form of an adjacency matrix. In these
matrices the rows and columns are assigned to the nodes in the network and the presence of an
edge is symbolised by a numerical value. By using the matrix representation of the network, we
can calculate network properties such as degree, and other centralities by applying basic concepts
from linear algebra.
An adjacency matrix is a two-dimensional array that stores the presence or absence (or weight)
of an edge between every pair of nodes.
If there are n nodes, the matrix will be n × n in size.
Types:
If the matrix as a whole is called X, then the contents of any given cell are denoted x ij. For example, in the matrix
above, xij = 1, because Andy likes Bill. Note that this matrix is not quite symmetric (xij not always equal to xji).
Anything we can represent as a graph, we can also represent as a matrix. For example, if it is a valued graph, then
the matrix contains the values instead of 0s and 1s.
By convention, we normally record the data so that the row person "does it to" the column person. For example, if
the relation is "gives advice to", then xij = 1 means that person i gives advice to person j, rather than the other way
around. However, if the data not entered that way and we wish it to be so, we can simply transpose the matrix. The
transpose of a matrix X is denoted X'. The transpose simply interchanges the rows with the columns.
Graphs are fundamental notion to building, analyzing, and optimizing social networks.
Whether it’s recommending friends, detecting influencers, or securing platforms, graph-
based techniques drive modern social networking features.
Path and Connectivity:
Path refers to a sequence of connected vertices in a graph, essentially a route from one vertex to
another along edges, while connectivity describes whether there exists a path between every pair
of vertices within a graph, meaning a connected graph has a path between any two vertices
within it; if not, it's considered disconnected.
The most general form of connection between two actors in a graph is called a walk. A walk is a
sequence of actors and relations that begins and ends with actors.
Path: Path in a network is the sequence of edges leading from one node to another. The number of edges
between two nodes on a given path is considered distance. Related, the length of the shortest path
between two nodes is commonly called the “geodesic distance.”
Perhaps the most useful definition of a connection between two actors (or between an actor and itself) is a
path. A path is a walk in which each actor (and therefore each relation) in the graph may be used at most
once. The single exception to this is a closed path, which begins and ends with the same actor. All paths
are trails and walks, but not all walks and all trails are paths. In Figure, there are a limited number of
paths connecting A and C: {A, B, C}; {A, B, D, C}; {A, B, E, D, C}.
The length of a path is the number of relations in it. The length of a shortest path between two actors is
the geodesic distance between them. Thus, the geodesic distance between A and C in the graph of Figure
is 2.
Connectivity:
Connectivity is a property of a network (not of its individual actors) that extends the concept of
adjacency. If it is possible to establish a path from any actor to any other actor of a network (e.g. every
actor is reachable by every other one), the network is said to be connected; otherwise the network is
disconnected. The network represented in Figure is connected.
Adjacency refers the direct connection between two actors, it may be directed or un-directed. If
there are more path ways connect two actors then there is a high connectivity to reach the actors
from different paths.
Society A 3 5 1 8
Society B 4 6 2 7
Society C 1 2 0 1
Society D 2 5 3 4
From the table the Actor Society C is having weak connectivity. If any Actor refuses to send
message to one then it is difficult for Actor Society C to get most of the information.
Network Distance: The distance between links/ties. For example a link means Dev is the friend
Sharma. Sastri is the friend of friend of Dev which is distance of 2. Rao is the friend of friend of
friend of Dev which is having the network distance of 3.
Breadth First Search is referred to as BFS. A vertex-based method is used to determine the
shortest path in a graph.
In the BFS traversal approach, every node on a level is first investigated before moving on to the
next level.
The BFS traversal uses a queue data structure. The idea of backtracking is not used in BFS.
BFS identifies the route from the source to the destination vertex that travels the fewest number
of edges.
For vertices that need to be found near the source vertex, BFS traversal is the best option.
The Breath First Search algorithm traverses the graph level by level. Let's express it
straightforwardly. As active users of LinkedIn, you may have noticed the presence of 1st, 2nd,
and 3rd connections. The definitions are presented below.
2nd connection: People who are directly connected to your 1st connection but are
not directly connected to you.
3rd connection: People who are directly linked to your 2nd connection but do not
have a direct connection with you.
As you might expect, when employing BFS to expand a network, the process involves traversing
all 1st connections before moving on to the 2nd connections and subsequent levels. This is an
accurate representation of how Breadth-First Search operates within a graph. The path will
traverse the graph in a layer-by-layer manner, rather than delving deeply into specific branches.
This can be further illustrated with a simple example. Suppose the network structure represents a
group of individuals, where connections indicate friendship. Let us commence with person A.
It is evident that individuals A and B, as well as A and C, form friendships, whereas A and E,
and B and C, do not. As you can envision, the sequence will encompass A, followed by B, C,
and D, and ultimately, E, F, G, H, and I. This will be demonstrated through both visual
representations and code.
Here, the visited nodes will be marked as red, the current node (or equivalently, the first node in
the queue) will be marked as green, nodes in the queue but not the current node and not in the
visited set will be marked as purple, and unvisited nodes will be marked as blue in the tree, while
all nodes in the Visited and Queue will remain blue as usual. Initially, the items will be placed in
the Queue and then transferred to Visited. Subsequently, any neighbors that are not
in Visited will be added to the queue. This process will be repeated multiple times until all nodes
(people) have been checked, or until there are no remaining items in the queue. The
term Queue is utilized in reference to a data structure known as a queue, which adheres to the
principles of first in, first out (FIFO). Furthermore, these images depict the conclusion of the
respective procedures.
Step 1: Checking over A.
Figure 2: Checking over
Step 2: Moving A into the Visited and subsequently adding its neighboring nodes, B, C, and D,
to the Queue. It is important to note that B is marked as green, while C and D are marked as
purple, reflecting the current focus on node B.
Figure 3: Moving
Step 3: B is moved into the Visited, and then the neighbors of B, E, and F, are placed in
qthe Queue. While A is a neighbor of C, it cannot be added to the Queue as it is already in
the Visited.
Figure 4: Moving
Step 4: C is moved into the Visited category, and then the neighbor of C, which is G, is placed
in the Queue. Once more, the addition of A to the Queue is not permitted.
Figure 5: Moving
Step 5: D is moved into the Visited, and subsequently, the neighbors of D that are not already in
the Visited, namely H and I, are placed in the Queue.
Figure 6: Moving
Step 6: We are at E now. It is important to note that although E has a neighbor B, once it is
marked as Visited, B can no longer be added to the Queue. Consequently, E can be moved into
the Visited category.
Figure 7: Moving
Step 7: Moving F into the Visited category in a similar manner. The neighbor of F, which is B,
cannot be added to the Queue as it has already been visited.
Figure 8: Moving
Figure 9: Moving
Step 9: Moving H into the Visited.
In this instance, all nodes are depicted in red, indicating that the entire graph has been traversed
and the process has been finalized. The Visited path encompasses the
sequence A, B, C, D, E, F, G, H, and I, progressing from the upper level to the middle level, and
ultimately to the lower level.
Example, In Network broadcasting involves the transmission of information to receiving devices.
In general, a device will transmit information to its nearest recipient devices. They are referred to
as first-degree friends. The individuals classified as 1st-degree friends will disseminate the
information to their acquaintances, thereby functioning as 2nd-degree friends. This process will
continue until all recipients within the network have received the message.
This process progresses from one level to the next, mirroring the operational principles of
Breadth-First Search (BFS).
Because it necessitates first studying every surrounding node, it is not appropriate for the
decision tree.
Facebook, like many social networks, likely employs variations of graph algorithms for tasks
such as finding the minimum distance between users or checking connectivity. While the specific
algorithms used by Facebook are proprietary and not publicly disclosed, we can discuss the
general approaches:
1. Breadth-First Search (BFS): This algorithm is typically used for finding the
shortest path in unweighted graphs. If Facebook treats the social graph as
unweighted (where each connection has the same importance), BFS would be a
suitable choice to find the minimum distance between two users or to check if they
are connected.
2. Depth-First Search (DFS): While DFS can be used to explore the graph, it is not
optimal for finding the shortest path in unweighted graphs. It may be used in certain
scenarios, but it is less efficient than BFS for connectivity checks.
In practice, Facebook might use a combination of these algorithms along with optimizations and
heuristics, especially given the scale and complexity of their social graph.
Here's a breakdown of the key components involved in network datasets and how they relate to
each other:
1. Nodes (Vertices/users/actors)
2. Ties (Edges/relationships/connections)
● Graph Theory: The mathematical study of graphs is the foundation for understanding
the structure of networks.
● Community Detection: Identifying clusters of nodes that are more densely connected to
each other than to the rest of the network.
● Centrality Measures: Identifying the most important or influential nodes in the network.
● Network Dynamics: Studying how the network changes over time, such as in response
to changes in connections or external factors.
Network datasets offer rich insights into complex systems, and analyzing them can reveal
important patterns and behaviors that are not easily identifiable through other means.
5. Influencers/Centrality:
● Definition: Influencers are individuals or nodes within the network that have significant
impact or control over the behavior of other nodes.
● Actors those who are connected with more number of actors in the network has the maximum
centrality. This type of actors are having advantages as acting like bridges, third parties, getting
more information, transferring more information and having more alternative paths.
● In social networks, influencers are often key figures who drive trends, spread
information, or affect the decisions of others. They can be identified through measures
like centrality (e.g., betweenness centrality, degree centrality, eigenvector centrality).
Key Measures of Influence:
1. Degree Centrality:
o Measures the number of direct connections a node has.
o A higher degree centrality means the node is well-connected and can influence many
others.
o Example: A popular Instagram user with thousands of direct followers has high degree
centrality.
2. Betweenness Centrality:
o Measures how often a node acts as a bridge along the shortest path between other nodes.
o Nodes with high betweenness centrality play a key role in controlling information flow
in the network.
o Example: A journalist on Twitter who connects different communities by sharing
information between them.
3. Eigenvector Centrality:
o Measures a node’s influence based on the quality and number of its connections.
o A node is influential if it is connected to other highly influential nodes.
o Example: A celebrity followed by other famous influencers has a high eigenvector
centrality.
4. Closeness Centrality:
o Measures how close a node is to all other nodes in the network.
o Nodes with high closeness centrality can access and spread information quickly.
o Example: A company’s CEO who is directly connected to all department heads, enabling
rapid decision-making.
o
Making Connections
Making connections is the foundational process of identifying and representing relationships between
individuals, groups, or entities within a network. This process translates real-world social interactions into
formats amenable to analysis using network science methods, serving as the critical first step in
understanding the complex dynamics of social networks.
● Homophily: People tend to associate with others who share similar interests, beliefs, or
backgrounds (e.g., political views, profession).
● Group Membership: Individuals form or join groups based on shared goals, values, or
interests (e.g., online communities, religious groups).
● Social Capital: Networks provide resources, information, and support, helping
individuals gain advantages in social and professional life.
● Structural Balance: Social relationships often exhibit balance, where individuals adjust
connections to maintain harmony in affiliations.
Examples of Affiliation:
Identity is the way individuals define themselves and are recognized within a social network. It
influences how people interact, the groups they belong to, and their perceived social roles.
Typically, the nodes represent specific data points, and the links represent the connections
between them.
If you need to understand connections in your data, there can be huge advantages to looking
beyond your ‘flat’ data model with powerful link analysis tools.
Link analysis requires a step-by-step approach, which will vary depending on individual
technology stacks, and the maturity of your data management processes. A generic link analysis
process could be:
Step 1 – data collection – from various structured and unstructured sources. A benefit of link
analysis is the ability to connect information from disparate silos, for a more unified view of
information.
Step 2 – data cleansing and entity extraction – this involves normalizing and deduplicating
information, and ensuring nodes and links have the right attributes and identifiers for your
analysis.
Step 3 – building the data model – this is the translation of a conceptual model, based on the
information you want to uncover, into a logical model, based on the information you have. Read
more about data modeling.
Step 4 – visualization and analysis – this is when you see and interact with nodes and links in a
link analysis chart. You can enhance your visual analysis with various techniques and
algorithms, including centrality analysis, automated layouts, timeline visualization and node
grouping.
The world is densely connected, and link analysis will help you understand those connections.
Unlike other forms of data analysis, link analysis focuses as much on the connections between
data as on the data points themselves. That makes it possible to uncover hidden relationships and
identify patterns, like dependencies or anomalies, that are not visible in aggregated analysis.
It’s intuitive: Exploring networks as node-link structures instantly makes sense, even to people
who’ve never worked with connected data before.
It’s fast: Our brains are great at spotting patterns, but only when the info is in a tangible format.
Link analysis helps you identify trends and outliers quickly.
It’s holistic: Link analysis makes it possible to connect data points from diverse sources,
breaking down silos for a holistic view of the full data context.
It’s scalable: Link analysis lets you simplify complexity, see context and understand detail.
With one chart, you can get an overview or dive into specific connections.
It’s insightful: Through interactive data analysis, you gain deeper knowledge and understand
context. That’s hard to achieve with a static, aggregated visualization.
Cybersecurity: Find patterns, anomalies and outliers in your complex threat landscape so you can
predict, detect and prevent cyber threats.
Financial crime & fraud: Analyze connections between people, places, transactions and events to
manage compliance and defeat fraudsters.
Security & Intelligence: Visualize threat information from communications data and intelligence sources
to beat organized crime and terrorism.
Law enforcement: Reveal connections, understand patterns, predict behaviors and gain insight that can
prove, solve or prevent crime.
Blockchain & crypto: Simplify blocks of transactions, track the movement of funds and verify crypto
users to counteract criminal activity.
Knowledge graphs: Make large-scale graphs accessible, understandable and intuitive through visual data
discovery and interactive analysis.
Infrastructure: Map complex systems, networks and pipelines to identify critical paths, prevent
bottlenecks and uncover vulnerabilities.
Supply chain: Power end-to-end visibility, efficiency and traceability in supply networks to optimize
performance at every stage.
Weighted Networks:
A weighted network is a graph where the edges have associated weights that represent the
strength, capacity, cost, or some other quantitative value of the connection between nodes. These
networks extend the concept of unweighted graphs, making them more suitable for real-world
scenarios where relationships are not uniform in strength.
1. Nodes (Vertices):
o Represent entities such as individuals, objects, or locations.
2. Edges (Links):
o Represent connections or relationships between nodes.
3. Weights:
o Quantitative values assigned to edges (e.g., frequency of interaction, distance, trust level,
flow capacity, monetary value). Weights can be positive or negative, representing
beneficial or detrimental relationships, respectively
Strength (Weighted Degree): The sum of edge weights connected to a node. This metric
quantifies the total influence, activity, or capacity associated with a node. For example, in a
social network, a user's strength might be the total number of messages sent and received.
o In-Strength / Out-Strength: For directed weighted networks, representing the sum of
incoming and outgoing edge weights, respectively.
Weighted Clustering Coefficient: Measures how tightly a node’s neighbors are connected,
taking weights into account. It quantifies the "cliquishness" of a node's local neighborhood,
giving more importance to stronger connections.
Weighted Shortest Path: Pathfinding algorithms (e.g., Dijkstra’s algorithm, Bellman-Ford
algorithm) consider edge weights to find the "shortest" or most efficient path between two
nodes, where "shortest" could mean least cost, shortest time, or highest capacity.
Weighted Betweenness Centrality: Measures the importance of a node in terms of weighted
shortest paths passing through it. A node with high weighted betweenness centrality acts as a
crucial "bridge" or "bottleneck" in the flow of information or resources.
Weighted Closeness Centrality: Measures how "close" a node is to all other nodes in the
network, considering the sum of weighted shortest paths. Nodes with high weighted closeness
centrality can quickly reach other nodes.
Weighted PageRank/Eigenvector Centrality: Extensions of centrality measures for unweighted
networks that incorporate edge weights to assess the influence or importance of nodes based
on the strength of their connections and the importance of their neighbors.
Hypergraphs:
A hypergraph generalizes the concept of a graph by allowing edges (called hyperedges) to
connect more than two nodes simultaneously. This is particularly useful for representing multi-
way relationships.
Structure of a Hypergraph:
1. Nodes (Vertices):
o Entities or elements in the system.
2. Hyperedges:
o Subsets of nodes that define multi-way relationships. Each hyperedge can connect
an arbitrary number of nodes.
1. Degree:
o Node degree: The number of hyperedges a node belongs to.
o Hyperedge size: The number of nodes in a hyperedge.
2. Clustering in Hypergraphs:
o Measures the tendency of nodes to form clusters in multi-way interactions.
3. Centrality:
o Captures the importance of a node or hyperedge in terms of the overall structure.
Influence and Virality Analysis: By looking at how information spreads through multi-way
interactions, hypergraphs can help us identify which users or groups are central in spreading
viral content or influencing the larger network.
Group Chats (WhatsApp, Slack, etc.): In a group chat, a hyperedge can represent the group
itself. Rather than connecting individual pairs of people (as in a traditional graph), a hyperedge
connects all the users in the group simultaneously, allowing us to analyze the dynamics of
interactions within the group.
Event Participation: If multiple users attend an event or participate in a hashtag trend, we can
represent these multi-way relationships using a hyperedge. Instead of connecting pairs of users
who liked a post, the hyperedge would capture all users who attended or participated in the
event.
Random graph:
Random graph models are fundamental tools in social media analytics and data analysis,
providing a mathematical framework to understand and simulate the structure and evolution of
social networks. They help researchers gain insights into phenomena like information spread,
community formation, and influence. or
A random graph is a type of graph where edges (connections) between nodes (users or entities)
are formed randomly based on some probability. It is commonly used in social media analytics
to model and study unpredictable patterns of user connections and interactions.
Nodes (vertices) typically represent users, profiles, or pieces of content (e.g., posts, tweets).
Edges (links) represent connections or interactions between these nodes (e.g., friendships,
followers, retweets, likes, mentions).
Random graph models are generative models, meaning they provide rules or probabilities for
creating connections between nodes, allowing us to understand how different network structures
emerge. The "randomness" comes from the probabilistic nature of these connections.
Despite their utility, random graph models have limitations in social media analysis:
Simplification of Reality: Real social networks are far more complex than most models assume,
incorporating factors like temporal dynamics, weighted edges (strength of relationships), multiple
types of relationships, and non-random human behavior.
Static vs. Dynamic: Many basic random graph models are static, while social media networks
are constantly evolving (new users join, old ones leave, connections are made and broken).
Dynamic network models are needed to capture this fluidity.
Assumptions: Each model relies on specific assumptions that may not perfectly hold true for all
social media platforms or types of interactions.
Data Size and Complexity: Analyzing real-world social media graphs, which can have billions
of nodes and edges, requires significant computational resources.
Bias in Data: Data collected from social media platforms can be biased, affecting the accuracy of
the models.
Random graph models provide a powerful theoretical foundation for understanding social media
networks. While simpler models offer fundamental insights, more advanced and complex models
are continually being developed to better capture the intricate realities of online social
interactions.
Network evolution?