0% found this document useful (0 votes)

49 views62 pages

Social Network Analysis Guide

The document discusses analyzing social networks through attributes, behaviors, and content of nodes. It describes identifying clusters in networks and interpreting them based on node and edge definitions. Methods for building networks like sampling and defining node and edge types are covered.

Uploaded by

yuanzwang5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views62 pages

Social Network Analysis Guide

Uploaded by

yuanzwang5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 62

Big Data Analytics

MS4252
Social Network Analysis III

Sources: Jennifer Golbeck 2013. Analyzing the Social Web.

Elsevier (chapter 7, 8, 9, 13)

1
Chapter 7

UNDERSTANDING STRUCTURE
THROUGH ATTRIBUTE AND
BEHAVIOR
2
Describe the Network
 To really understand what’s happening in a
network, look beyond the structure at
 users’ attributes, behaviour, the content they are
sharing, and their interactions

 Analyzing this content can lead to many insights

into the meaning of network structure

3
Attribute, Behavior, and Content
 Attributes are characteristics of a node
 Age, gender, or location
 Person’s religion or political preferences

 Behavior refers to actions of nodes

 How frequently she posts,
 Who and how many people she follows, or
 How often she clicks on shared links.

4
Attribute, Behavior, and Content
 Content is a broader term that refers to the
nonstructural information about a node
 Includes any information about the nodes or edges
beyond the structural features.
 Combination of attributes and behavior, like the types of
comments they post online or the topics they discuss.
 If a node represents a video or picture instead of a
person, the content would include the things depicted in
the video.

5
Structural Analysis
 Identify those clusters
 provide statistics about the network properties (like
density and connectivity)
 about the importance of individual nodes with measures
(like centrality)

6
How to interpret the three clusters?
 Built from the photo sharing website Flickr
 On Flickr, photos are labeled with descriptive
keywords called tags
 Nodes represent tags and an edge between tags
indicate that they were used to describe the
same image.
 e.g. if an image is tagged with the word “desk”
and “keyboard” the network would show a line
connecting those two words.
 Network is a 1.5 egocentric network of a single
tag

7
Now with content

8
Break down into Egocentric Network
 The network of three months of discussion on the CSS-Discuss
mailing list. Node size reflects the node’s out-degree in this directed
network. Edge means the the person has replied to another.

9
Chapter 8

BUILDING NETWORKS

10
First Question?
 What do the nodes represent?

 What do the edges represent?

 Know this before doing anything with data!

11
Example
 Facebook Network
 What are the nodes and edges?

 What can you see if you were able to visualize this

network?
 What would network statistics mean in this network?
 What patterns might emerge?

12
Network with Multiple node types
 Bipartite graphs have two node types that do not
have connections within the type
 E.g. no people connected to one another

 Graphs can have multiple node types and not be

bipartite

13
To build a network
 Step 1: Define Nodes
 What are they?
 What are the criteria for being included?

 Example, build a social network of people in

Company X.
 Full time employees
 Part-time employees
 Contractor who are hired to work temporarily
 …

14
To Build a Network
 Step 2: Define Edges
 What does a edge represent?
 What is the criteria for adding one?

 Same Example: Employee network at Company

X, two people are connected if
 Two people work in the same dept and work closely on
many projects.
 One person works for another.
 Two people participate in department-wide discussions
on a mailing list.

15
Sampling Methods
 Some networks may be too big to analyze in
their entirety.
 Millions of nodes and edges are difficult to understand,
impossible to visualize.

 Sampling-selecting a subset of nodes and edges

—is an effective and common way of obtaining a
reasonably sized dataset from a large one.
 Random sampling
 Snowball sampling
 Egocentric analysis

16
Example: Enron email network
 Node: messages
 Edge: any pair of nodes that have exchanged at least 10
emails.

17
Random Sampling
 Select a certain percentage of nodes and keep all edges
between them, or
 Select a certain percentage of edges and keep all the
nodes that are mentioned.
Random Edge Sampling

50% 25% 10%

18
Random Sampling
 Problems
 Edge sampling biased toward high degree nodes
 Node sampling loses some structural characteristics.

 Benefits
 Reduce the network to a smaller size in an even way
 A picture of the overall patterns of relationships and
clusters can be seen
 Node sampling keeps some network statistical features

19
Snowball Sampling
 A technique commonly used in sociology where
participants are interviewed

 When working with a large network, choose a

starting node (initial seed node)

 Get that node, its connections, their connections, and so

on until the network is the right size for analysis

20
Snowball Sampling
 Problems
 Biased toward the part of the network sampled, may
miss other features
 Benefits:
 Easy to do, common

21
Egocentric Network Analysis
 Instead of looking at the whole network, look at
the egocentric networks of some nodes.
 Randomly selected egocentric networks, or
 The networks of individuals selected based on certain
characteristics

 A different type of analysis than overall network

analysis, but it shows the role of an individual in
context

22
Egocentric Network Analysis
 A 1.5 egocentric network of a person who posts
in both English and Spanish
 People who post only in Spanish are shown in black,
those posting only in English are in white, and people
using multiple languages or a third language are in
gray.

23
Chapter 9

ENTITY RESOLUTION AND LINK

PREDICTION

24
Overview
 Link Prediction is a method of analysis that detects
where missing links should be present in the network.

 Entity resolution is a technique for merging nodes that

represent the same person.

25
Link Prediction
 Data often has errors in it, including missing
links,
 and link prediction could identify places where an
analyst might want to check to confirm that there
is no edge between a pair of nodes
 It can also be used to identify people.

The frequency to attend the meeting

26
Systematic methods for predicting link
 If we have two nodes, A and B, then the
score(A,B) indicates how closely connected A and
B are in the graph
 After computing the score for every pair of
nodes, the algorithm returns a ranked list
 The pairs with the highest score are predicted to
be the most likely new edges.

27
Score Method
 Shortest Path Length

 Common Neighbors

 Jaccard Index

 Adamic/Adar

 Preferential Attachment

28
Score – Shortest Path
 One of the simplest ways to score the similarity
or closeness of two nodes is to use the shortest
path length between them
 Score(A,B) = - shortestPath(A,B)

29
Score – Common Neighbors
 Another way of computing score
 uses more information from the network structure is to
count the number of common neighbors between the
two nodes in a pair
 Score(A, G) = Neighbor(A) ∩ Neighbor(G)

30
Score – Common Neighbors
 The number of common neighbors makes social
sense, too.
 The more friends two people have in common, the more
likely they are to be introduced to one another.

 However, some people have an abnormally high

number of connections in social networks, like
singers
 many friends in common may not mean much since
they are connected

31
Score – Jaccard Index
 The Jaccard Index counts the total number of
friends in common and divides that by the total
number of people who are friends of either node

32
Example
 Four nodes: Alice, Bob, Chuck, and Dave.
 Let Alice and Bob be celebrities, each with 1 million
friends; Chuck and Dave are average users with 100
friends each.
 Now say Alice and Bob have 2,000 friends in common
while Chuck and Dave have only 20 friends in common.
 Although Alice and Bob may seem to be more strongly
connected than Chuck and Dave, since they have 100
times more common friends, the Jaccard Index
indicates this is not the case
 What is Jaccard Index for Alice and Bob? How
about Chuck and Dave?
 What if the 20 people Chuck and Dave know in
common are also celebrities
33
Score – Adamic/Adar
 Sum up the inverse log of each neighbor’s degree
(common)
 Giving more weights to friends with fewer edges.
 The formula is

34
Score – Preferential Attachments
 The network principle states nodes with a high degree
are more likely to gain new links.
 Popular nodes are more likely to gain new friends than less
popular nodes.

 When predicting edges, preferential attachments

suggest that nodes with high degree are more likely to
gain new edges.

35
Score – Preferential Attachments
 The formula for this score method is relatively is

 With this measure, we would predict that the

next edge to appear will be between nodes A and
G.

36
Entity Resolution
 Identify nodes that represent the same entity
and then to merge them together
 Entity resolution is a technique for merging
nodes that represent the same person.
 John and J. Smith are actually the same person

37
Scoring Techniques
 For each pair of nodes, we can compute a score that
represents the likelihood that they are the same node.
 Then we can set a threshold value.
 Any pair of nodes with score above the threshold will be
merged.

38
Scoring Techniques
 To create a score for each pair of nodes, we will
consider similarity on a set of attributes.
 For each pair of nodes, record 1 if their values match and 0
otherwise for a given attribute.

 However, matching on some attributes is more important than

on others (weights).

 Then, we add the positive weights for each attribute where

nodes match, and add the negative weights when they do not
match.

39
Scoring Techniques
 Example: two nodes (J Smith and John Smith) and five
attributes
 The vector of match/nonmatch is (0, 1, 0, 0, 1)
 Suppose the weights are ()
 We know that because a match on SSN is much
definitive.

40
Scoring Techniques
 We are left to find a method for computing the weight
for each attribute.
 We need two probabilities.
 Probability : the probability that the two nodes will match on a
attribute by chance.
 Example, same birth month is 1/12

 Probability : the probability that two nodes that represent the

same person will have the same value (close to 1).
 Example, the m probability for the birth month is 0.95

41
Scoring Techniques
 Once we have the and probabilities, we need to turn
them into weights.
 There are two weights for each attribute.

 The common formulas are:

 For a match,
 For a nonmatch,

 For the month attribute, for a match and for a

nonmatch.

42
Score for partial match
 Return to the `John Smith’ and `J Smith’ example, while
their first names are not an exact match, they are
closed.
 We may label this as a partial match.
 For example, if we say `J’ is a 0.3 match for `John’, we would
add 0.3 times the weight for a name match.

 In this approach, the formula that a partial match with

value we would add the following to the score:

 Comparison? 43
Application
 Link Prediction
 Friend Recommendation

 Entity Resolution
 Finding duplicated account

https://sensorstechforum.com/new-facebook-scam-fake-duplicate-accounts-fraud/
44
Incorporating Network Data
 Relational data is useful for enhancing the attribute-
based similarity!

45
Decision to merge
 The results from these similarity measures can be used
in addition to attribute data.
 Jointly use two similarity scores!

46
Chapter 13

SOCIAL INFORMATION FILTERING

47
Social Information Filtering
 Some of the information on the social media will be
more useful than the rest, and the sheer volume means
a filter would be useful to sort through it.

 This chapter introduces methods used to sort, filter, and

aggregate information from the social media.

 Two major approaches:

 Social voting
 Recommender systems

48
Social sharing and social filtering
 Social-sharing websites, like Digg, Slashdot, Reddit, are
designed to share interesting content.

 The community then votes items up or down, and the

most interesting links are highlighted.

 The reliance on large number of people to help

complete a task is a type pf crowdsourcing.

49
Automated recommender systems
 Amazon, Netflex, Pandora, use recommender systems
(RS) to suggest items a user might like.

 The RS leverages a user’s previous activity, either on

explicit data such as user ratings, or data implicitly
captured from user’s behavior such making a purchase
or viewing an item.

 The core idea behind RS is to take data created by other

people and personalize it for the individual user.

50
Traditional recommender systems
 Recommender systems work in one the two ways:

 Suggesting items similar to the ones a person likes (item-based)

 Suggesting item liked by people who are similar to the user
(user-based; collaborative filtering)

 Item based recommendation is not very social, but the

user-based RS relies on other people’s actions.

51
Traditional RS: example
 How to suggest new items to Alice? How much she
might like movie Vertigo?
 Compute the correlation between Alice and Bob (0.26),
and Alice and Chuck (0.83)!

Vertigo

?
3
4 52
Social Recommender Systems
 Collaborative Filtering: an example of how algorithms
can leverage data from crowd!

 With the rise of social media, RS have started to

consider social connections in addition to similarity.

 Example: In twitter, when a user searches for a term,

the search results can be shown in three ways:
 All tweets that match the search
 ‘Top’ tweets
 Tweets only posted by people the user knows

53
Case Study: Reddit Voting
 Social News Website
 Users vote stories up or down

 How to count votes?

 Using the total is biased to older posts
 Using percentage can skew high or low for new posts

54
Reddit

55
Case Study: Reddit Voting
 Solution:
 Estimate how many votes a post would get over time

 Use a 95% confidence interval to forecast.

 When Reddit is relatively sure a new post will get more

up votes than an older one, it ranks it higher.

56
Case Study: Trust-based RS

57
Trust-based Recommender Systems
 Trust-based recommenders, like FilmTrust, use social
network information.

 Instead of computing similarity between two users as

the weight, we can use trust value from social network
data.

 The system estimates how much the user might trust

other people’s opinion. How?

58
Trust
 Alice does not know Dave.

 Dave is trustworthy at a level 3 out of 10, and Chuck gives Dave

a 10 out of 10
 Alice could average these values, and estimate Dave’s
trustworthiness at (10+3)/2=6.5.
 However, Alice trusts Bob’s opinion far more than Chuck’s.

59
Weighted average
 Alice has given Bob a higher trust rating

 Weighted average

60
Example with People
 Bob, Chuck, and Dave all saw and rated the move
`Night of the Living Dead’.
 Their ratings are as follows on a five-star scale:
 Bob: 5 stars
 Chuck: 2 stars
 Dave: 4 stars

 Recommended rating:

61
How good is a RS?
 Generally: RMSE

 Error = My rating – Recommended Rating

 Compute the RMSE

 Back to Example, when user’s rating of a movie moved away

from the average rating, the trusted based rating greatly
outperformed collaborative filtering algorithm.

 Need alternative ways of evaluating systems

 Serendipity over accuracy
 Diversity

Topic 3 New
No ratings yet
Topic 3 New
48 pages
Social Network Analysis
No ratings yet
Social Network Analysis
22 pages
Notes - SN
No ratings yet
Notes - SN
5 pages
Web Data Analysis for PhD Students
No ratings yet
Web Data Analysis for PhD Students
93 pages
Social Network Analysis (2017 Reg) - Unit2
No ratings yet
Social Network Analysis (2017 Reg) - Unit2
47 pages
Social Network Analysis: Lakshminarayana Sadineni Assistant Professor Department of Iot & Is
No ratings yet
Social Network Analysis: Lakshminarayana Sadineni Assistant Professor Department of Iot & Is
23 pages
Social Networks
No ratings yet
Social Networks
85 pages
Mod1 2
No ratings yet
Mod1 2
21 pages
SNA-UNIT-1 Full
No ratings yet
SNA-UNIT-1 Full
84 pages
Week 16
No ratings yet
Week 16
47 pages
Pink and Black Modern Computer Presentation - 20250910 - 142323 - 0000
No ratings yet
Pink and Black Modern Computer Presentation - 20250910 - 142323 - 0000
17 pages
Unit 5 - SNA AK
No ratings yet
Unit 5 - SNA AK
13 pages
SMA Unit-II
No ratings yet
SMA Unit-II
54 pages
Module VI - Mining Social Network Graph
No ratings yet
Module VI - Mining Social Network Graph
88 pages
SocialNetworkAnalysis FullNote
No ratings yet
SocialNetworkAnalysis FullNote
10 pages
Module1: Introduction: Prof. Punitha K, VIT Chennai
No ratings yet
Module1: Introduction: Prof. Punitha K, VIT Chennai
50 pages
Introduction To Social Network Analysis
No ratings yet
Introduction To Social Network Analysis
38 pages
Introduction To Social Network Analysis
No ratings yet
Introduction To Social Network Analysis
28 pages
SNA - T4-5 - Pagerank and Communities
No ratings yet
SNA - T4-5 - Pagerank and Communities
56 pages
Social Network Analysis Basics
No ratings yet
Social Network Analysis Basics
43 pages
2025 Sma M2
No ratings yet
2025 Sma M2
247 pages
1.6 L4 - Social Network Structure, Measures Visualization - Part I
No ratings yet
1.6 L4 - Social Network Structure, Measures Visualization - Part I
36 pages
Lesson 1
No ratings yet
Lesson 1
50 pages
15-Social Network Analysis
No ratings yet
15-Social Network Analysis
18 pages
Social Network Analysis Basics
No ratings yet
Social Network Analysis Basics
67 pages
I Am Sharing 'DSE ASSIGNMENT ADITI CHAUDHARY' With You
No ratings yet
I Am Sharing 'DSE ASSIGNMENT ADITI CHAUDHARY' With You
7 pages
Social Network Analysis Guide
No ratings yet
Social Network Analysis Guide
40 pages
End SEm SNA
No ratings yet
End SEm SNA
34 pages
T3 Sna 2324
No ratings yet
T3 Sna 2324
80 pages
BDA Unit - 05
No ratings yet
BDA Unit - 05
7 pages
NodeXL Chapter 3
No ratings yet
NodeXL Chapter 3
19 pages
Lecture 12
No ratings yet
Lecture 12
31 pages
Unit 6 Mining Social Network Graph
No ratings yet
Unit 6 Mining Social Network Graph
9 pages
Social Network Analytics Notes
No ratings yet
Social Network Analytics Notes
14 pages
Data Representation & Networks
No ratings yet
Data Representation & Networks
26 pages
Social Network Analysis
No ratings yet
Social Network Analysis
82 pages
Unit 5
No ratings yet
Unit 5
20 pages
Unit 1 SNA QB
No ratings yet
Unit 1 SNA QB
11 pages
Social Network Analysis in Big Data
No ratings yet
Social Network Analysis in Big Data
20 pages
05 Networks
No ratings yet
05 Networks
48 pages
Sna It Unit5
No ratings yet
Sna It Unit5
20 pages
Social Network Analysis Course Overview
100% (1)
Social Network Analysis Course Overview
39 pages
Topic 2
No ratings yet
Topic 2
46 pages
Soc 260 Notes
No ratings yet
Soc 260 Notes
6 pages
DS Unit 5
No ratings yet
DS Unit 5
89 pages
Topic 2 Scratch
No ratings yet
Topic 2 Scratch
47 pages
Sna Lec1
No ratings yet
Sna Lec1
42 pages
C2 - Social Network Measurement
No ratings yet
C2 - Social Network Measurement
42 pages
Poster Surf
No ratings yet
Poster Surf
1 page
Strong Tie:: Q1. Illustrate Granovetter Strenghth of Weak Ties
No ratings yet
Strong Tie:: Q1. Illustrate Granovetter Strenghth of Weak Ties
10 pages
Key Concepts in Social Network Analysis
No ratings yet
Key Concepts in Social Network Analysis
9 pages
Unit 2 Complete Notes Unit 2 Complete Notes
No ratings yet
Unit 2 Complete Notes Unit 2 Complete Notes
31 pages
A Cohesion Based Friend Recommendation System
No ratings yet
A Cohesion Based Friend Recommendation System
16 pages
Unit Ii
No ratings yet
Unit Ii
28 pages
Link Prediction in Social Networks
No ratings yet
Link Prediction in Social Networks
28 pages
03-Basic Concepts of Network Analysis
No ratings yet
03-Basic Concepts of Network Analysis
40 pages
Mod-5 Bda Super Imp
No ratings yet
Mod-5 Bda Super Imp
22 pages
5network 01 Intro
No ratings yet
5network 01 Intro
202 pages
ENTREPRENEURSHIP
No ratings yet
ENTREPRENEURSHIP
2 pages
Grammar Practice Will Future Predictions Worksheetttt
No ratings yet
Grammar Practice Will Future Predictions Worksheetttt
2 pages
Foundation Engineering Course
No ratings yet
Foundation Engineering Course
70 pages
MSME
No ratings yet
MSME
3 pages
12th Maths EM Queation Types English Medium PDF Download
No ratings yet
12th Maths EM Queation Types English Medium PDF Download
18 pages
Food Microbial Ecology Insights
No ratings yet
Food Microbial Ecology Insights
10 pages
Science Technology and Society - LP3
No ratings yet
Science Technology and Society - LP3
14 pages
Postharvest Handling of Cassava Roots
No ratings yet
Postharvest Handling of Cassava Roots
3 pages
Sample Paper Final Term
No ratings yet
Sample Paper Final Term
6 pages
Quality Circle Report
100% (3)
Quality Circle Report
45 pages
Order Letter 2
No ratings yet
Order Letter 2
3 pages
Integrating Informal Sector Recycling Into Waste Management - Insights From Value Chain Analysis of 3 Asian Countries
No ratings yet
Integrating Informal Sector Recycling Into Waste Management - Insights From Value Chain Analysis of 3 Asian Countries
7 pages
Maritime Cargo Management Exam
0% (1)
Maritime Cargo Management Exam
2 pages
Day 2 Passage 2
No ratings yet
Day 2 Passage 2
4 pages
Troubleshooting Combustion Chamber Leaks
No ratings yet
Troubleshooting Combustion Chamber Leaks
4 pages
Unboxing TP-5013 Power Socket Review
No ratings yet
Unboxing TP-5013 Power Socket Review
1 page
Usage History: Total Balance Used Rs 0.04
No ratings yet
Usage History: Total Balance Used Rs 0.04
4 pages
Fed-Batch and Continuous Culture Kinetics
No ratings yet
Fed-Batch and Continuous Culture Kinetics
30 pages
Unit III. Learning Theories and Models
No ratings yet
Unit III. Learning Theories and Models
40 pages
"B" Shifting Tool Specifications
No ratings yet
"B" Shifting Tool Specifications
3 pages
Schoolhead SY 2024 2025 OPCRF
No ratings yet
Schoolhead SY 2024 2025 OPCRF
12 pages
B&D, 2010, Automotive & Electronic Catalog
No ratings yet
B&D, 2010, Automotive & Electronic Catalog
27 pages
NYCSampleFinalExam v2
No ratings yet
NYCSampleFinalExam v2
22 pages
EPP - ICT - Creating A Multimedia Presentation Using The Advanced Features of MS PowerPoint Tool
No ratings yet
EPP - ICT - Creating A Multimedia Presentation Using The Advanced Features of MS PowerPoint Tool
27 pages
A.P. Student and Teacher Development Plan
No ratings yet
A.P. Student and Teacher Development Plan
2 pages
02 Chapter 2 - Functional Units
No ratings yet
02 Chapter 2 - Functional Units
38 pages
The 4th InTraders International Conference On International Trade Proceeding Book
No ratings yet
The 4th InTraders International Conference On International Trade Proceeding Book
197 pages
TIA EIA 568 B.2 1final
No ratings yet
TIA EIA 568 B.2 1final
86 pages
Tabel Fender AN 1
No ratings yet
Tabel Fender AN 1
2 pages
Project Report
No ratings yet
Project Report
6 pages

Social Network Analysis Guide

Uploaded by

Social Network Analysis Guide

Uploaded by

Big Data Analytics

Sources: Jennifer Golbeck 2013. Analyzing the Social Web.

 Analyzing this content can lead to many insights

 Behavior refers to actions of nodes

 What do the edges represent?

 Know this before doing anything with data!

 What can you see if you were able to visualize this

 Graphs can have multiple node types and not be

 Example, build a social network of people in

 Same Example: Employee network at Company

 Sampling-selecting a subset of nodes and edges

50% 25% 10%

 When working with a large network, choose a

 Get that node, its connections, their connections, and so

 A different type of analysis than overall network

ENTITY RESOLUTION AND LINK

 Entity resolution is a technique for merging nodes that

The frequency to attend the meeting

 However, some people have an abnormally high

 When predicting edges, preferential attachments

 With this measure, we would predict that the

 However, matching on some attributes is more important than

 Then, we add the positive weights for each attribute where

 Probability : the probability that two nodes that represent the

 The common formulas are:

 For the month attribute, for a match and for a

 In this approach, the formula that a partial match with

SOCIAL INFORMATION FILTERING

 This chapter introduces methods used to sort, filter, and

 Two major approaches:

 The community then votes items up or down, and the

 The reliance on large number of people to help

 The RS leverages a user’s previous activity, either on

 The core idea behind RS is to take data created by other

 Suggesting items similar to the ones a person likes (item-based)

 Item based recommendation is not very social, but the

 With the rise of social media, RS have started to

 Example: In twitter, when a user searches for a term,

 How to count votes?

 Use a 95% confidence interval to forecast.

 When Reddit is relatively sure a new post will get more

 Instead of computing similarity between two users as

 The system estimates how much the user might trust

 Dave is trustworthy at a level 3 out of 10, and Chuck gives Dave

 Error = My rating – Recommended Rating

 Back to Example, when user’s rating of a movie moved away

 Need alternative ways of evaluating systems

You might also like