Lectures-merged
2.1
URL: https://www.youtube.com/watch?v=WYB6V0gTJps
Week: "2"
OSM API
Enables you to interact with social media to enable users to
collect data
Each has its own API and rate limits
Python
Has a lot of libraries and we will be using it
Data Formats
JSON
XML
MySQL
Relational Database to store data, stores data in rows and
columns
Viewed with PhpMyAdmin
MongoDB
Object is stored as data
Recommends to use RoboMongo but I would prefer Compose
Graph API
All content in meta is stored in graph format and every
interaction is an edge in the graph
All nodes have an unique numeric id
2.2
URL: https://www.youtube.com/watch?v=yf4jNe3X_mg
Week: "2"
Trust and Credibility
This image explains the graph that many tweets rumor,
variety of rumors can spread.
The true information comes later than rumors.
Our Objective:
How to identify rumors and stop them from spreading
How to get true earlier
Methodology
1. Data Collection and Filtering
2. Data Characterization
3. Classification Module
1. Feature Generation
2. Obtaining Ground Truth
4. Evaluating Results
Data Collection
Total tweets: 1.7M
Total Unique users: 1.17M
Tweets with Links: 0.6M
Origin of Tweet - Geo Code
Data Filtering
Guardian manually annotated the data and publicly
distributed them
Fake Images were twice of Real Images
Analysis Helps: Who, When, Where, What, Why and How
Network Analysis:
In an hour the spread is exponential
The user who started it may not be the biggest spreader
of it
Classification
Some features of the tweets
1. User Profile
2. Network of the User
3. Content (Tweet feature, has a lot of dimensions)
Results:
Data Description in Boston Blast
Spiking peaks as real world events unfold
Other Details in Boston Blast
Fake accounts were created to spread malicious data
Some suspensions have based on people reporting
Some have 73k followers and are still active
True data was mostly posted via web and not mobile
Network Analysis shows that Fake Accounts are a closed
community
Cronbach Alpha is a value that tells us how confident some
information is, It is part of Inter-Annotator Rater. It
should be more 0.7 for getting approved.
Annotation is mostly done by folks, I doubt if that is the
case now
TweetCred is a Chrome Extension that helps us tell the
credibility of a tweet, It also takes feedback
NDCG (Normalized Discounted Cumulative Gain)
A metric that evaluates how well a ranking algorithm
organizes items based on relevance
Ideal DCG by DCG is NDCG
2.T
Week: "2"
Reddit API
praw is Python Reddit API Wrapper
Flair is an indication similar to hashtags
3.1
URL: https://www.youtube.com/watch?v=SkK6ejOS-XE
Week: "3"
Facebook
Features are quite different
Facebook is bidirectional
Connections are more personal
FBI
Just like TweetCred
Method:
Web of Trust Scores
Given domain it will come with two scales:
Rating Scale: Website's Reputation
Confidence
3.2
URL: https://www.youtube.com/watch?v=4C4P_tzjthc
Week: "3"
API Keys
Do not share API Keys (Lol)
Privacy
Every context has different privacy expectations
Westin's 3 Categories
Fundamentalists - 25%
Pragmatists - 60%
Unconcerned - 15%
Dog Nigga
A black dog can be identified via internet
#piracyindia12
Data was collected with regards to privacy
It had good demographics
Many thought that their data was safe because they have
specified the privacy settings, secondly people where
concerned, thirdly people didn't care
Many said they would accept friend request from good profile
or if it is of opposite gender
3.T.1
URL: https://www.youtube.com/watch?v=ERvCBJn-9tU
Week: "3"
Twitter API
Read/Write based tokens
Advertising Management is also provided
Tweepy - Python Wrapper
Search API
Based on query
Based on geocode
Based on Language
recent/popular/mixed
Based on count
Only works for last 7 days
Streaming API
Real time API
Create a Listener
Filter on English
Cannot all tweets
4.1
URL: https://www.youtube.com/watch?v=oiqJZ3FIX_w
Week: "4"
Hard to Define
It can not be usefully addressed at all
Claim of Individual/Groups to determine for themselves when
how and what extent information about them is communicated
to others
Control of information
Forms of Privacy
Information
Internet
Communication
Telephone
Territorial
Living Space
Bodily
Self
In 2015, 1.8B photos where uploaded on OSM everyday
Many things are colluding
Increasing public self-disclosure
Improving Face Recognition
Cloud Computing
Technique for Reidentifying users
Question
Can we use publicly available images like in FB and stuff and
use models that are off the shelves to reidentify anyone?
The goal is to
Use unidentified source (Flickr, CCTVs) and identified
sources (Facebook, LinkedIn, Government Sites etc.)
To get some sensitive information of individual like gender
orientation or SSN or Aadhaar
Latanya Sweeney
Combined medical data and Voter List to reidentify users
Independent data sets can be used to reidentify
Experiment 1
Data
(Identified) FB Profiles from one city from USA
(reidentified) Downloaded profiles from one of the popular
websites
Approach
Identified + Unidentified -> Re-identify
PittPatt produces a score of -1.5 to 20
Showed to Mturkers to validate
Results
Highly Likely matches : 6.3%
Highly Likely + Likely: 10.5%
1 on 10
Experiment 2
offline to online comparison
Pictures from FB college network to identify students
Data
25k profiles
26k Images
114k Faces
Process
Pics taken of individuals walking in campus
Asked to fill online survey
Match pic from cloud while they were filling survey
Last page of the survey with options of their pictures
Asked to select pics which matched closely
Results
38% were matched correctly
Including a dude that was not even in FB
<3s computation
Experiment 3
Combination of 1 and 2
Predicted SSN from public data
Faces / FB data + Public data --> SSN
27% of subjects got first 5 SSN right with 4 attempts
starting from their faces
4.T.1
URL: https://www.youtube.com/watch?v=pEyizxN3K84
Week: "4"
numpy
Arrays are faster
Consumes less memory
Mechanism to specify data types
# creating array with list
a = np.array([1, 2, 3])
# get data type
a.dtype
# creatinfg zeroes
np.zeros(10, dtype=int)
# zero matrix
np.zeros((3,3))
# create array with a value ex: 9x9 with 69s
np.full((9,9), 69)
# create a range sort of thing
# ex: [0, 5, 10, 15]
np.arange(0, 20, 5) # (start, end, step)
# creat n elements between two numbers
np.linspace(0, 150, 20) # (start, end, n)
# random elements
b = np.random.random((3,3))
# convert to a type
b.astype("int16")
# diagonal matrix
np.eye(5)
# generate random integer array
# ex: 20 Random Integer between 1 and 150
r = np.array([np.random.randint(1, 150) for i in range(20)])
# shape of the array
a.shape
# number of dimensions
a.ndim
# number of elements
a.size
# ascessing nth last
# ex: second last
a[-2]
# get i, j in mn matrix
# ex: 2,2
x[2,2]
# Slicing
# ex: selecting every second element
x[::2]
# ex: first 5 elements
x[:5]
# ex: last 5 elements
x[5:]
# ex: from 4th to 7th element
x[4:7] # 4, 5, 6 elements
# ex: If you want 3 to 8 but every second element
x[3:8:2]
# ex: same as previous but in reverse
x[3:8:-2]
# Concatenate
np.concatenate(x, y)
# along an axis
np.concatenate(x, y, axis=0)
# along x axis
np.hstack(x, y)
# along y axis
np.vstack(x, y)
# Splitting Array
a.reshape(4, 4)
# Vertical Split
np.vsplit(grid, [2])
# Horizontal Split
np.hsplit(grid, [2])
# matrix multiplication
np.matmul(x, y)
4.T.2
URL: https://www.youtube.com/watch?v=V-PozDJ7c1A
Week: "4"
pandas
Data analysis library
Has a lot of inbuilt
# creating array with list
a = np.array([1, 2, 3])
# get data type
a.dtype
# creatinfg zeroes
np.zeros(10, dtype=int)
# zero matrix
np.zeros((3,3))
# create array with a value ex: 9x9 with 69s
np.full((9,9), 69)
# create a range sort of thing
# ex: [0, 5, 10, 15]
np.arange(0, 20, 5) # (start, end, step)
# creat n elements between two numbers
np.linspace(0, 150, 20) # (start, end, n)
# random elements
b = np.random.random((3,3))
# convert to a type
b.astype("int16")
# diagonal matrix
np.eye(5)
# generate random integer array
# ex: 20 Random Integer between 1 and 150
r = np.array([np.random.randint(1, 150) for i in range(20)])
# shape of the array
a.shape
# number of dimensions
a.ndim
# number of elements
a.size
# ascessing nth last
# ex: second last
a[-2]
# get i, j in mn matrix
# ex: 2,2
x[2,2]
# Slicing
# ex: selecting every second element
x[::2]
# ex: first 5 elements
x[:5]
# ex: last 5 elements
x[5:]
# ex: from 4th to 7th element
x[4:7] # 4, 5, 6 elements
# ex: If you want 3 to 8 but every second element
x[3:8:2]
# ex: same as previous but in reverse
x[3:8:-2]
# Concatenate
np.concatenate(x, y)
# along an axis
np.concatenate(x, y, axis=0)
# along x axis
np.hstack(x, y)
# along y axis
np.vstack(x, y)
# Splitting Array
a.reshape(4, 4)
# Vertical Split
np.vsplit(grid, [2])
# Horizontal Split
np.hsplit(grid, [2])
# matrix multiplication
np.matmul(x, y)
5.1
URL: https://www.youtube.com/watch?v=rPmjaAB8AAk
Week: "5"
Foursquare
Foursquare can be used to trace where people live
You check-in into a place
You can tip people
You become mayor if you go to a place 60 times
You get free parking spots if you are mayor
Policing
Hudson River incident was the first time people started
using Social Media to help folks
#myNYPD to #myLAPD
Purpose
Keep Citizens Informed
Use citizen to post
Cash reward
Appreciation of Police Officers
Fake Handles are there a lot.
Verified Accounts are essential
5.2
URL: https://www.youtube.com/watch?v=d9T9VVoUcKE
Week: "5"
Objective
Whether OSN can support Police to get actionable information
about crime and resident's opinions about policing
activities in urban cities of India
Methodology
Collect Data from BLR City police
Filter Post & Comments to relevance
1.6K comments and 250 Posts
Data Coding
1. Content Based
Missing
Query
Traffic
2. Style
Formal
Informal
3. Type
- Acknowledge to
- Reply to
- Follow-up by
- Ignored by
Lexical Analysis using word trees
Result
It would be better if the complaints provide details like
time and/or location
Communication from citizens were formal when complaining and
informal when appreciating
Communication from Police was always formal
Engagement Type
Mostly Acknowledging
Replying
Follow up
Ignore (1/3)
Understanding Victimization
Using textual content to see what is victimizing the
citizens
Accountability
Can be measured with response time
Or with less ignorance
Citizens also accept they too are involved
Why is a keyword here to
Understanding Needs/Wants
Resident Expectations like needs and wants
Meeting Expectations -> Increased Safety
Keywords: need-to-be, want-to-see
The way forward
Actionable information can be collected
Mutual Accountability
Understanding fear and Victimization effects
5.3
URL: https://www.youtube.com/watch?v=Ao_ZuLPVlP8
Week: "5"
Measuring Human Behavior
Can we quantify attributes of communication?
Identify behavioral attributes like
Affective Expression
Engagement
Social and Cognitive response
Type of Interaction
1. c-2-c
2. p-2-c
3. p-2-p
4. c-2-p
Research Questions
1. RQ_1: Topical Characteristics
2. RQ_2: Engagement Characteristics
3. RQ_3: Emotional Exchanges
4. RQ_4: Cognitive and Social Orientation
Methodology
1. Topics
1. N Gram Analysis
2. K-means Clustering
2. Engagement
1. No. of police citizen who comments in posts
2. Distinct citizens who comment in posts
3. Average no. of likes and comments
3. Emotional (LIWC and Anew Dictionary)
1. Valence - Positivity and Negativity
2. Arousal - Intensity
4. Social and Cognitive (LIWC)
1. Interpersonal Focus
2. Social Orientation
3. Cognition
Clusters of Topic
Police initiated discussions are more focused
"Please take action" was the most common word
Engagement Characteristics
When police does a post there is much more interacting
Police suggest an appropriate action and discussion tends to
close early, resulting in lower interaction
Emotional Characteristics
Negative sentiments are higher in citizen initiated threads
because they are expressing some issue
Anxiety reduces when police starts commenting on a citizen
post
Arousal is also high because they want the police to get the
work done
Social and Cognitive
Discussion threads involving just the citizens are highly
self-attention focused
Why it matters?
Helps police and improving policing and community sensing
Enable emotional support to residents experiencing safety
concerns
Tech Implications
Help gauge changing emotions and behavior
Sense and record the reactions citizen and share the with
decision maker
6.1
URL: https://www.youtube.com/watch?v=z1IqDHJm6N0
Week: "6"
eCrimes in OSM
Phishing
Act of tricking someone into handling over her login creds
in order to exploit personal information
Spear Phishing: Target specific people
Whaling: Specific CEOs are targeted
Link in FB saying "There was some issue with Facebook login,
click here to solve it"
Example:
FB tech support DMs you
New Login system
Fake customer service accounts
File a complaint and tag legitimate accounts
Fake accounts would reply as if it is them and try to ask
you for credentials
Fake comments on popular posts
Lot of 18+ clickbait
They lead to credit card phish
More Fake stuff
Fake online discounts
Fake online survey and contests
Fake live streaming videos
Fake tip: Foursquare Spam
Social Reputation
Folks respect you with the number of followers you have
Social status dictates reputation
A lot of them are manipulated:
Paid good reviews for a bad products
Fake followers
Click baiting
Getting you to click on links
#hijacking
Using a hashtag to selling products or do something, those
things will have nothing related to the hashtag
Compromised Account
Hacked accounts posting wrong information or other bad
activity
Impersonation
Pretending to be someone else
Work from home scam
Want to earn money sitting at home? Click here
I earned 80 bucks doing nothing
6.T.4
URL: https://www.youtube.com/watch?v=d6bi0QTaX5Y
Week: "6"
Social Network Analysis
This is an example of SN graph, nodes are followers
SNA Metrics
Centrality
Indegree: Most influential
Outdegree: Who disseminates information
Betweenness: Quickly Approachable, basically a node through
which good number of nodes reach other set of nodes
Closeness: How close a node is to other nodes
Community & Modularity
How would a computer understand a graph
CSV is also used
Adjacency Matrix
GraphML Format
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="d0" for="node" attr.name="color" attr.type="string">
<default>yellow</default>
</key>
<key id="d1" for="edge" attr.name="weight" attr.type="double"/>
<graph id="G" edgedefault="undirected">
<node id="n0">
<data key="d0">green</data>
</node>
<node id="n1"/>
<node id="n2">
<data key="d0">blue</data>
</node>
<node id="n3">
<data key="d0">red</data>
</node>
<node id="n4"/>
<node id="n5">
<data key="d0">turquoise</data>
</node>
<edge id="e0" source="n0" target="n2">
<data key="d1">1.0</data>
</edge>
<edge id="e1" source="n0" target="n1">
<data key="d1">1.0</data>
</edge>
<edge id="e2" source="n1" target="n3">
<data key="d1">2.0</data>
</edge>
<edge id="e3" source="n3" target="n2"/>
<edge id="e4" source="n2" target="n4"/>
<edge id="e5" source="n3" target="n5"/>
<edge id="e6" source="n5" target="n4">
<data key="d1">1.1</data>
</edge>
</graph>
</graphml>
<node id="n1"> represents a node
<edge source="n1" target="n2"> represents an edge
<data key="d0">green</data> represents a data
twecoll is a library that will help us get our graph
twecoll init username Inits the service
Then it asks for consumer key, give it
Then it asks for consumer secret, give it
Then it will generate a link, open it and authorize
Then setup is done
Run python twecoll fetch username , it will get your followers
and their followers as well
Then you need build edge for your followers and their
followers
Run python twecoll edgelist username it will create a GML file
Use Gephi to view the GML file
In case the graph is dense, try choosing different layouts
Modularity -> Communities
You can color the nodes by selecting color by attribute as
Modularity Class
Size based on Indegrees is better to visualize
7.1
URL: https://www.youtube.com/watch?v=oxXCzyRdTio
Week: "7"
Spammers
Top 100,000 spam followers account for 60% of all links
acquired by the spammers
Top spam-followers tend to reciprocate all links established
to them by spammers
Spammers try to increase their in-links
If I have less followers it is likely that I won't
reciprocate the follow by a spammer, responsiveness
increases with followers
Link Farmers
Even actual legitimate people do some link farming
Top link farmers are not spammers
Top Link farmers have very high indegree and outdegree
In/Out is also 1 for them
Account bio of top 100K and Random sample
100k promots their own business or content or trends with
legitimate external sources
Random don't tweet to external sources
7.2
URL: https://www.youtube.com/watch?v=lz_IivUTQjk
Week: "7"
Research has shown that nobody ever reads terms and
conditions or privacy policy
National opportunity cost for ready privacy policy = $781b
Facemail from MIT
You don't get to see whom you send mail to exactly
You can visualize their face with this tool
Experimental Setup
Picture Nudge: They built a chrome extension that tells you
"These people, your friends and FRIENDS of FRIENDS can see
your port"
Timer Nudge: You have 10 seconds to cancel your post
Sentiment Nudge: Folks might perceive this as negative
Methodology
IRB approval is required
Recruitment from craigslist, flyers etc.
Analysis Metrics
Number of changes inline privacy settings
Number of cancelled or edited posts
Post Frequency
Topic Sensitivity
Profile Picture Nudge
One dude tried to hide her acquaintances when shit posting
about her job
One dude got shy and didn't post many pictures
Timer Nudge
At times annoying and at time handy
Wait for timer to expire or hit "post now"
A dude said "Made me think about the posts"
Cancelled a few because of thinking
Sentiment Nudge
It was losing the context
Many cancelled because of this nudge
Post Frequency Dropped: 13 -> 7
Conclusion
Intervention helps user make better decision
More work is needed to understand which works when
7.3
URL: https://www.youtube.com/watch?v=AfTNyw3_TdE
Week: "7"
Semantic Attack
Attack that happens when humans are targeted
Target the way we, as humans assign meaning to content
Semantic Barrier: The barrier between what you are doing and
what the system things you are doing.
Things that make up a phishing email:
Urgency in Subject
Spelling Mistakes
Links take you to random websites
Types of Phishing Attack:
1. Phishing
2. Context-aware phishing / spear phishing
3. Whaling
4. Vishing: Over phone
5. Smsishing: Over SMS
6. Social Phishing
Social Phishing
Using social data that is available to perform phishing
attack
How phishing attacks can be performed by collecting personal
information from social networks?
Methodology
Collected public information using tools like Perl LWP
library
Correlated it with IU's address book database
Control vs. Experiment
Control: The email from IU email ID, but, from an unknown
person
Experimental: From a friend in IU
Flow
1. Public data is harvested
2. Data is correlated and stored in RDB
3. Heuristics are used to craft spoofed email message by Eve
4. Message is sent to bob
5. Bob follows the link contained and is sent to an unchecked
redirect
6. Bob is sent to attacker whuffo.com
7. Bob is asked for creds
8. Bob's creds are verified with university authenticator
9. Then
a. Bob is phished
b. Not phished, could try again
Victims
Control: 16%, which is higher than usual
Social: 72%, consisted with other experiments
Success rate
70% authentication in first 12hrs
Takedown has be successful and quick
Younger targets were more vulnerable
Science department had the maximum different between control
and social
Technology had the lowest victims #satisfying
Repeated Authentications
The users actually tried again because of the overload
message
Some even tried 80 times
Gender
Opposite gender was the highest
Male to Male was least
Female seems to be more vulnerable
Reactions
Anger
Researchers got fired
Psychological cost
Unethical and illegal
Denial
Nobody accepted they fell for it
Misunderstanding over spoofing emails
Underestimation of publicly available information4
Conclusions
Extensive Educational Campaigns
Browser Solutions
Digitally signed emails
OSM provides a lot more information
7.T.5
URL: https://www.youtube.com/watch?v=Wqrea2rTV7I
Week: "6"
ntlk
NLP based library for string operations on data
# Gives out ["this", "is", ",", "PSOSM"]
nltk.tokensize.workd_tokensize("this is, PSOSM")
# clean the data by making it all lowercase
some_string.lower()
# Replace punctuation
obj = str.maketrans("", "", string.punctuation)
tokens = [i.translate(obj) for i in tokens]
# Remove stopwords
stop_words = set(nltk.corpus.stopwords.words("english"))
tokens = [i for in tokens if i not in stop_words and len(i) > 2]
8-9-10-11
Week: 8, 9, 10, 11
8.1
De-duplicating audience
Profile Linking approach
Values change over time: people's username change
Profile pic and description change very often
Given a two user profiles and the respective username sets,
each composed of past and current usernames, find if
profiles refer to a single individual
Why only usernames?
Ground Truth: users connect other handles via links
Past username collection
SVM classifier gives high output
Measuring volume of sentiments
8.2
Anonymous Networks
4chan
Whisper
Secret
Yik Yak
Wickr
Why do we need anonymous network
increasing awareness
Snowden Disclosure
PRISM surveillance program
Bal Thackeray Incident
What is Whisper?
GUI: Global Universal Identifier, removed in 2014
55% get no replies in Whisper
94% replies in one day
Unlikely to get attention later
80% post less than 10 total whisper
15% only replies
30% only post no reply
Average degree is very high
Very low clustering
No small world phenomenon
Assortavity is very less -> Random graph
18% content deleted compared to 4% in twitter
70% of deleted whispers are deleted within a week
2% stay after a month
90% of the two users co located in the same "State"
75% have their distance < 40 miles
Smaller user population in same nearby area, higher chance
of encounter
Active people have higher chances of meeting
New people -> New Posts does not work here
Users disengage
New users make 20% content
8.T.6
Gephi is basically a graph visualization tool
You can give nodes and edges as csv
Stats tab
Filters tab
Range for Followers, Out degree
Intersection to combine filters
Network Diameter
Density
Modularity
Page Rank
Connected Components
Layout allows you to visualize in different ways
Node Size according to data
Color according to Modularity
Direct Selection
Rectangle Selection
Drag Tool
Painter Tool
Node Pencil
Edge Pencil
Edit tool
Camera Button
Preview Tab
Select in Data Laboratory
Show in overview button
Tag cloud
Export filtered to new workspace
9.1
Location Based services
1. Foursquare
2. Yelp
3. Gowalla
4. Facebook
5. Twitter
Perceptions in OSM and Mobile Network
Mayorships is an incentive
pleaserobme.com looked at tweets and if user talks abouts a
location X, while belonging to another location, it means
they are travelling
People have designed cities based on data from foursqaure
badges and mayorship: gamifications of apps
Users can post tips, can serve as feedback
done or to-do
9.2
we are interested in done, to-do and mayorship
yahoo/geo/placefinder
few users have many mayorships and most have only one
few cities have many mayorships but most have very less
Some found correlation between number of mayorships and
number of tips and dones
New york is common in tips and dones
There are chances that there are mayorships but no tips
Dones are sparser
Lots of tips are generated 1hr apart
70% of the users have average distance of 150km
10% have 6000km
Frequency is 24hrs in most cases
Take all users, and classfiy
9.T.7
python-highcharts
chart = Highchart()
options = {
'chart': {
'type': "bar"
},
'title': {
'text': "Highchart bar"
},
'legend': {
'enabled': True
},
'xAxis': {
'categories': ['User 1', 'User 2']
},
'yAxis': {
'title': {
'text': "Number of Followers"
}
},
data1 = [123, 2323, 234, 34]
data2 = [238, 132, 1223, 3443]
chart.set_dict_options(options)
chart.add_data_set(data1, 'bar', name='day1')
chart.add_data_set(data2, 'bar', 'day2')
chart.save_file('/bar-chart.py')
10.1
Location based on other social networks
Twitter is the highest
0.5% users were geo tagged
reproducibility of the research