0% found this document useful (0 votes)

93 views37 pages

Social Media APIs & Data Analysis

The document discusses various social media and data analysis topics including OSM API, Python libraries, MySQL, MongoDB, privacy concerns, Twitter API, Facebook features, and using publicly available images and models to potentially reidentify users. Pandas and NumPy are also overviewed including creating and manipulating arrays.

Uploaded by

Bhoopesh M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views37 pages

Social Media APIs & Data Analysis

Uploaded by

Bhoopesh M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Lectures-merged

2.1

URL: https://www.youtube.com/watch?v=WYB6V0gTJps
Week: "2"

OSM API
Enables you to interact with social media to enable users to
collect data
Each has its own API and rate limits

Python
Has a lot of libraries and we will be using it
Data Formats
JSON
XML

MySQL
Relational Database to store data, stores data in rows and
columns
Viewed with PhpMyAdmin

MongoDB
Object is stored as data
Recommends to use RoboMongo but I would prefer Compose

Graph API
All content in meta is stored in graph format and every
interaction is an edge in the graph
All nodes have an unique numeric id
2.2

URL: https://www.youtube.com/watch?v=yf4jNe3X_mg
Week: "2"

Trust and Credibility

This image explains the graph that many tweets rumor,

variety of rumors can spread.
The true information comes later than rumors.

Our Objective:

How to identify rumors and stop them from spreading

How to get true earlier

Methodology
1. Data Collection and Filtering
2. Data Characterization
3. Classification Module
1. Feature Generation
2. Obtaining Ground Truth
4. Evaluating Results

Data Collection

Total tweets: 1.7M

Total Unique users: 1.17M
Tweets with Links: 0.6M
Origin of Tweet - Geo Code

Data Filtering
Guardian manually annotated the data and publicly
distributed them
Fake Images were twice of Real Images
Analysis Helps: Who, When, Where, What, Why and How

Network Analysis:
In an hour the spread is exponential
The user who started it may not be the biggest spreader
of it

Classification

Some features of the tweets

1. User Profile
2. Network of the User
3. Content (Tweet feature, has a lot of dimensions)

Results:

Data Description in Boston Blast

Spiking peaks as real world events unfold

Other Details in Boston Blast

Fake accounts were created to spread malicious data

Some suspensions have based on people reporting
Some have 73k followers and are still active
True data was mostly posted via web and not mobile
Network Analysis shows that Fake Accounts are a closed
community
Cronbach Alpha is a value that tells us how confident some
information is, It is part of Inter-Annotator Rater. It
should be more 0.7 for getting approved.
Annotation is mostly done by folks, I doubt if that is the
case now
TweetCred is a Chrome Extension that helps us tell the
credibility of a tweet, It also takes feedback

NDCG (Normalized Discounted Cumulative Gain)

A metric that evaluates how well a ranking algorithm
organizes items based on relevance
Ideal DCG by DCG is NDCG

2.T

Week: "2"
Reddit API
praw is Python Reddit API Wrapper
Flair is an indication similar to hashtags

3.1

URL: https://www.youtube.com/watch?v=SkK6ejOS-XE
Week: "3"

Facebook
Features are quite different
Facebook is bidirectional
Connections are more personal

FBI
Just like TweetCred
Method:

Web of Trust Scores

Given domain it will come with two scales:

Rating Scale: Website's Reputation
Confidence

3.2

URL: https://www.youtube.com/watch?v=4C4P_tzjthc
Week: "3"

API Keys
Do not share API Keys (Lol)

Privacy
Every context has different privacy expectations

Westin's 3 Categories
Fundamentalists - 25%
Pragmatists - 60%
Unconcerned - 15%

Dog Nigga
A black dog can be identified via internet

#piracyindia12

Data was collected with regards to privacy

It had good demographics
Many thought that their data was safe because they have
specified the privacy settings, secondly people where
concerned, thirdly people didn't care
Many said they would accept friend request from good profile
or if it is of opposite gender

3.T.1

URL: https://www.youtube.com/watch?v=ERvCBJn-9tU
Week: "3"

Twitter API
Read/Write based tokens
Advertising Management is also provided
Tweepy - Python Wrapper

Search API
Based on query
Based on geocode
Based on Language
recent/popular/mixed
Based on count
Only works for last 7 days

Streaming API

Real time API

Create a Listener
Filter on English
Cannot all tweets

4.1

URL: https://www.youtube.com/watch?v=oiqJZ3FIX_w
Week: "4"

Hard to Define
It can not be usefully addressed at all
Claim of Individual/Groups to determine for themselves when
how and what extent information about them is communicated
to others
Control of information

Forms of Privacy

Information
Internet
Communication
Telephone
Territorial
Living Space
Bodily
Self

In 2015, 1.8B photos where uploaded on OSM everyday

Many things are colluding

Increasing public self-disclosure
Improving Face Recognition
Cloud Computing
Technique for Reidentifying users

Question
Can we use publicly available images like in FB and stuff and
use models that are off the shelves to reidentify anyone?

The goal is to

Use unidentified source (Flickr, CCTVs) and identified

sources (Facebook, LinkedIn, Government Sites etc.)
To get some sensitive information of individual like gender
orientation or SSN or Aadhaar

Latanya Sweeney
Combined medical data and Voter List to reidentify users
Independent data sets can be used to reidentify

Experiment 1

Data

(Identified) FB Profiles from one city from USA

(reidentified) Downloaded profiles from one of the popular
websites

Approach

Identified + Unidentified -> Re-identify

PittPatt produces a score of -1.5 to 20
Showed to Mturkers to validate

Results

Highly Likely matches : 6.3%

Highly Likely + Likely: 10.5%
1 on 10

Experiment 2
offline to online comparison
Pictures from FB college network to identify students

Data

25k profiles
26k Images
114k Faces

Process

Pics taken of individuals walking in campus

Asked to fill online survey
Match pic from cloud while they were filling survey
Last page of the survey with options of their pictures
Asked to select pics which matched closely

Results

38% were matched correctly

Including a dude that was not even in FB
<3s computation

Experiment 3

Combination of 1 and 2
Predicted SSN from public data
Faces / FB data + Public data --> SSN
27% of subjects got first 5 SSN right with 4 attempts
starting from their faces
4.T.1

URL: https://www.youtube.com/watch?v=pEyizxN3K84
Week: "4"

numpy
Arrays are faster
Consumes less memory
Mechanism to specify data types

# creating array with list

a = np.array([1, 2, 3])

# get data type

a.dtype

# creatinfg zeroes
np.zeros(10, dtype=int)

# zero matrix
np.zeros((3,3))

# create array with a value ex: 9x9 with 69s

np.full((9,9), 69)

# create a range sort of thing

# ex: [0, 5, 10, 15]
np.arange(0, 20, 5) # (start, end, step)

# creat n elements between two numbers

np.linspace(0, 150, 20) # (start, end, n)

# random elements
b = np.random.random((3,3))

# convert to a type
b.astype("int16")

# diagonal matrix
np.eye(5)
# generate random integer array
# ex: 20 Random Integer between 1 and 150
r = np.array([np.random.randint(1, 150) for i in range(20)])

# shape of the array

a.shape

# number of dimensions
a.ndim

# number of elements
a.size

# ascessing nth last

# ex: second last
a[-2]

# get i, j in mn matrix
# ex: 2,2
x[2,2]

# Slicing
# ex: selecting every second element
x[::2]

# ex: first 5 elements

x[:5]

# ex: last 5 elements

x[5:]

# ex: from 4th to 7th element

x[4:7] # 4, 5, 6 elements

# ex: If you want 3 to 8 but every second element

x[3:8:2]

# ex: same as previous but in reverse

x[3:8:-2]

# Concatenate
np.concatenate(x, y)

# along an axis
np.concatenate(x, y, axis=0)
# along x axis
np.hstack(x, y)

# along y axis
np.vstack(x, y)

# Splitting Array
a.reshape(4, 4)

# Vertical Split
np.vsplit(grid, [2])

# Horizontal Split
np.hsplit(grid, [2])

# matrix multiplication
np.matmul(x, y)

4.T.2

URL: https://www.youtube.com/watch?v=V-PozDJ7c1A
Week: "4"

pandas
Data analysis library
Has a lot of inbuilt

# creating array with list

a = np.array([1, 2, 3])

# get data type

a.dtype

# creatinfg zeroes
np.zeros(10, dtype=int)

# zero matrix
np.zeros((3,3))

# create array with a value ex: 9x9 with 69s

np.full((9,9), 69)

# create a range sort of thing

# ex: [0, 5, 10, 15]
np.arange(0, 20, 5) # (start, end, step)

# creat n elements between two numbers

np.linspace(0, 150, 20) # (start, end, n)

# random elements
b = np.random.random((3,3))

# convert to a type
b.astype("int16")

# diagonal matrix
np.eye(5)

# generate random integer array

# ex: 20 Random Integer between 1 and 150
r = np.array([np.random.randint(1, 150) for i in range(20)])

# shape of the array

a.shape

# number of dimensions
a.ndim

# number of elements
a.size

# ascessing nth last

# ex: second last
a[-2]

# get i, j in mn matrix
# ex: 2,2
x[2,2]

# Slicing
# ex: selecting every second element
x[::2]

# ex: first 5 elements

x[:5]

# ex: last 5 elements

x[5:]

# ex: from 4th to 7th element

x[4:7] # 4, 5, 6 elements

# ex: If you want 3 to 8 but every second element

x[3:8:2]

# ex: same as previous but in reverse

x[3:8:-2]

# Concatenate
np.concatenate(x, y)

# along an axis
np.concatenate(x, y, axis=0)

# along x axis
np.hstack(x, y)

# along y axis
np.vstack(x, y)

# Splitting Array
a.reshape(4, 4)

# Vertical Split
np.vsplit(grid, [2])

# Horizontal Split
np.hsplit(grid, [2])

# matrix multiplication
np.matmul(x, y)

5.1

URL: https://www.youtube.com/watch?v=rPmjaAB8AAk
Week: "5"

Foursquare
Foursquare can be used to trace where people live
You check-in into a place
You can tip people
You become mayor if you go to a place 60 times
You get free parking spots if you are mayor

Policing

Hudson River incident was the first time people started

using Social Media to help folks
#myNYPD to #myLAPD
Purpose
Keep Citizens Informed
Use citizen to post
Cash reward
Appreciation of Police Officers
Fake Handles are there a lot.
Verified Accounts are essential

5.2

URL: https://www.youtube.com/watch?v=d9T9VVoUcKE
Week: "5"

Objective
Whether OSN can support Police to get actionable information
about crime and resident's opinions about policing
activities in urban cities of India

Methodology
Collect Data from BLR City police
Filter Post & Comments to relevance
1.6K comments and 250 Posts

Data Coding
1. Content Based
Missing
Query
Traffic
2. Style
Formal
Informal
3. Type
- Acknowledge to
- Reply to
- Follow-up by
- Ignored by
Lexical Analysis using word trees

Result

It would be better if the complaints provide details like

time and/or location
Communication from citizens were formal when complaining and
informal when appreciating
Communication from Police was always formal

Engagement Type

Mostly Acknowledging
Replying
Follow up
Ignore (1/3)

Understanding Victimization

Using textual content to see what is victimizing the

citizens

Accountability

Can be measured with response time

Or with less ignorance
Citizens also accept they too are involved
Why is a keyword here to

Understanding Needs/Wants

Resident Expectations like needs and wants

Meeting Expectations -> Increased Safety
Keywords: need-to-be, want-to-see

The way forward

Actionable information can be collected

Mutual Accountability
Understanding fear and Victimization effects

5.3

URL: https://www.youtube.com/watch?v=Ao_ZuLPVlP8
Week: "5"

Measuring Human Behavior

Can we quantify attributes of communication?
Identify behavioral attributes like
Affective Expression
Engagement
Social and Cognitive response
Type of Interaction
1. c-2-c
2. p-2-c
3. p-2-p
4. c-2-p

Research Questions

1. RQ_1: Topical Characteristics

2. RQ_2: Engagement Characteristics
3. RQ_3: Emotional Exchanges
4. RQ_4: Cognitive and Social Orientation

Methodology
1. Topics
1. N Gram Analysis
2. K-means Clustering
2. Engagement
1. No. of police citizen who comments in posts
2. Distinct citizens who comment in posts
3. Average no. of likes and comments
3. Emotional (LIWC and Anew Dictionary)
1. Valence - Positivity and Negativity
2. Arousal - Intensity
4. Social and Cognitive (LIWC)
1. Interpersonal Focus
2. Social Orientation
3. Cognition

Clusters of Topic

Police initiated discussions are more focused

"Please take action" was the most common word

Engagement Characteristics

When police does a post there is much more interacting

Police suggest an appropriate action and discussion tends to
close early, resulting in lower interaction

Emotional Characteristics

Negative sentiments are higher in citizen initiated threads

because they are expressing some issue
Anxiety reduces when police starts commenting on a citizen
post
Arousal is also high because they want the police to get the
work done

Social and Cognitive

Discussion threads involving just the citizens are highly

self-attention focused

Why it matters?
Helps police and improving policing and community sensing
Enable emotional support to residents experiencing safety
concerns

Tech Implications
Help gauge changing emotions and behavior
Sense and record the reactions citizen and share the with
decision maker

6.1

URL: https://www.youtube.com/watch?v=z1IqDHJm6N0
Week: "6"

eCrimes in OSM
Phishing
Act of tricking someone into handling over her login creds
in order to exploit personal information
Spear Phishing: Target specific people
Whaling: Specific CEOs are targeted
Link in FB saying "There was some issue with Facebook login,
click here to solve it"
Example:
FB tech support DMs you
New Login system

Fake customer service accounts

File a complaint and tag legitimate accounts

Fake accounts would reply as if it is them and try to ask
you for credentials

Fake comments on popular posts

Lot of 18+ clickbait
They lead to credit card phish

More Fake stuff

Fake online discounts
Fake online survey and contests
Fake live streaming videos
Fake tip: Foursquare Spam

Social Reputation
Folks respect you with the number of followers you have
Social status dictates reputation
A lot of them are manipulated:
Paid good reviews for a bad products
Fake followers

Click baiting
Getting you to click on links

#hijacking
Using a hashtag to selling products or do something, those
things will have nothing related to the hashtag

Compromised Account
Hacked accounts posting wrong information or other bad
activity

Impersonation
Pretending to be someone else

Work from home scam

Want to earn money sitting at home? Click here
I earned 80 bucks doing nothing

6.T.4

URL: https://www.youtube.com/watch?v=d6bi0QTaX5Y
Week: "6"

Social Network Analysis

This is an example of SN graph, nodes are followers

SNA Metrics
Centrality
Indegree: Most influential
Outdegree: Who disseminates information
Betweenness: Quickly Approachable, basically a node through
which good number of nodes reach other set of nodes
Closeness: How close a node is to other nodes
Community & Modularity

How would a computer understand a graph

CSV is also used

Adjacency Matrix
GraphML Format

<?xml version="1.0" encoding="UTF-8"?>

<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="d0" for="node" attr.name="color" attr.type="string">
<default>yellow</default>
</key>
<key id="d1" for="edge" attr.name="weight" attr.type="double"/>
<graph id="G" edgedefault="undirected">
<node id="n0">
<data key="d0">green</data>
</node>
<node id="n1"/>
<node id="n2">
<data key="d0">blue</data>
</node>
<node id="n3">
<data key="d0">red</data>
</node>
<node id="n4"/>
<node id="n5">
<data key="d0">turquoise</data>
</node>
<edge id="e0" source="n0" target="n2">
<data key="d1">1.0</data>
</edge>
<edge id="e1" source="n0" target="n1">
<data key="d1">1.0</data>
</edge>
<edge id="e2" source="n1" target="n3">
<data key="d1">2.0</data>
</edge>
<edge id="e3" source="n3" target="n2"/>
<edge id="e4" source="n2" target="n4"/>
<edge id="e5" source="n3" target="n5"/>
<edge id="e6" source="n5" target="n4">
<data key="d1">1.1</data>
</edge>
</graph>
</graphml>

<node id="n1"> represents a node

<edge source="n1" target="n2"> represents an edge
<data key="d0">green</data> represents a data

twecoll is a library that will help us get our graph

twecoll init username Inits the service
Then it asks for consumer key, give it
Then it asks for consumer secret, give it
Then it will generate a link, open it and authorize
Then setup is done
Run python twecoll fetch username , it will get your followers
and their followers as well
Then you need build edge for your followers and their
followers
Run python twecoll edgelist username it will create a GML file
Use Gephi to view the GML file
In case the graph is dense, try choosing different layouts
Modularity -> Communities
You can color the nodes by selecting color by attribute as
Modularity Class
Size based on Indegrees is better to visualize

7.1

URL: https://www.youtube.com/watch?v=oxXCzyRdTio
Week: "7"
Spammers
Top 100,000 spam followers account for 60% of all links
acquired by the spammers
Top spam-followers tend to reciprocate all links established
to them by spammers
Spammers try to increase their in-links
If I have less followers it is likely that I won't
reciprocate the follow by a spammer, responsiveness
increases with followers

Link Farmers

Even actual legitimate people do some link farming

Top link farmers are not spammers
Top Link farmers have very high indegree and outdegree
In/Out is also 1 for them

Account bio of top 100K and Random sample

100k promots their own business or content or trends with

legitimate external sources
Random don't tweet to external sources

7.2

URL: https://www.youtube.com/watch?v=lz_IivUTQjk
Week: "7"

Research has shown that nobody ever reads terms and

conditions or privacy policy
National opportunity cost for ready privacy policy = $781b

Facemail from MIT

You don't get to see whom you send mail to exactly
You can visualize their face with this tool
Experimental Setup
Picture Nudge: They built a chrome extension that tells you
"These people, your friends and FRIENDS of FRIENDS can see
your port"
Timer Nudge: You have 10 seconds to cancel your post
Sentiment Nudge: Folks might perceive this as negative

Methodology

IRB approval is required

Recruitment from craigslist, flyers etc.

Analysis Metrics

Number of changes inline privacy settings

Number of cancelled or edited posts
Post Frequency
Topic Sensitivity

Profile Picture Nudge

One dude tried to hide her acquaintances when shit posting
about her job
One dude got shy and didn't post many pictures

Timer Nudge

At times annoying and at time handy

Wait for timer to expire or hit "post now"
A dude said "Made me think about the posts"
Cancelled a few because of thinking

Sentiment Nudge
It was losing the context
Many cancelled because of this nudge
Post Frequency Dropped: 13 -> 7

Conclusion
Intervention helps user make better decision
More work is needed to understand which works when

7.3

URL: https://www.youtube.com/watch?v=AfTNyw3_TdE
Week: "7"

Semantic Attack

Attack that happens when humans are targeted

Target the way we, as humans assign meaning to content
Semantic Barrier: The barrier between what you are doing and
what the system things you are doing.

Things that make up a phishing email:

Urgency in Subject
Spelling Mistakes
Links take you to random websites

Types of Phishing Attack:

1. Phishing
2. Context-aware phishing / spear phishing
3. Whaling
4. Vishing: Over phone
5. Smsishing: Over SMS
6. Social Phishing

Social Phishing

Using social data that is available to perform phishing

attack
How phishing attacks can be performed by collecting personal
information from social networks?

Methodology

Collected public information using tools like Perl LWP

library
Correlated it with IU's address book database

Control vs. Experiment

Control: The email from IU email ID, but, from an unknown
person
Experimental: From a friend in IU

Flow
1. Public data is harvested
2. Data is correlated and stored in RDB
3. Heuristics are used to craft spoofed email message by Eve
4. Message is sent to bob
5. Bob follows the link contained and is sent to an unchecked
redirect
6. Bob is sent to attacker whuffo.com
7. Bob is asked for creds
8. Bob's creds are verified with university authenticator
9. Then
a. Bob is phished
b. Not phished, could try again

Victims
Control: 16%, which is higher than usual
Social: 72%, consisted with other experiments

Success rate
70% authentication in first 12hrs
Takedown has be successful and quick
Younger targets were more vulnerable
Science department had the maximum different between control
and social
Technology had the lowest victims #satisfying

Repeated Authentications
The users actually tried again because of the overload
message
Some even tried 80 times

Gender
Opposite gender was the highest
Male to Male was least
Female seems to be more vulnerable

Reactions
Anger
Researchers got fired
Psychological cost
Unethical and illegal
Denial
Nobody accepted they fell for it
Misunderstanding over spoofing emails
Underestimation of publicly available information4

Conclusions

Extensive Educational Campaigns

Browser Solutions
Digitally signed emails
OSM provides a lot more information

7.T.5

URL: https://www.youtube.com/watch?v=Wqrea2rTV7I
Week: "6"

ntlk
NLP based library for string operations on data

# Gives out ["this", "is", ",", "PSOSM"]

nltk.tokensize.workd_tokensize("this is, PSOSM")

# clean the data by making it all lowercase

some_string.lower()

# Replace punctuation
obj = str.maketrans("", "", string.punctuation)
tokens = [i.translate(obj) for i in tokens]
# Remove stopwords
stop_words = set(nltk.corpus.stopwords.words("english"))
tokens = [i for in tokens if i not in stop_words and len(i) > 2]

8-9-10-11

Week: 8, 9, 10, 11
8.1
De-duplicating audience
Profile Linking approach
Values change over time: people's username change
Profile pic and description change very often
Given a two user profiles and the respective username sets,
each composed of past and current usernames, find if
profiles refer to a single individual
Why only usernames?

Ground Truth: users connect other handles via links

Past username collection
SVM classifier gives high output
Measuring volume of sentiments

8.2
Anonymous Networks
4chan
Whisper
Secret
Yik Yak
Wickr
Why do we need anonymous network
increasing awareness
Snowden Disclosure
PRISM surveillance program
Bal Thackeray Incident
What is Whisper?
GUI: Global Universal Identifier, removed in 2014
55% get no replies in Whisper
94% replies in one day
Unlikely to get attention later
80% post less than 10 total whisper
15% only replies
30% only post no reply
Average degree is very high
Very low clustering
No small world phenomenon
Assortavity is very less -> Random graph
18% content deleted compared to 4% in twitter
70% of deleted whispers are deleted within a week
2% stay after a month
90% of the two users co located in the same "State"
75% have their distance < 40 miles
Smaller user population in same nearby area, higher chance
of encounter
Active people have higher chances of meeting
New people -> New Posts does not work here
Users disengage
New users make 20% content

8.T.6
Gephi is basically a graph visualization tool
You can give nodes and edges as csv
Stats tab
Filters tab
Range for Followers, Out degree
Intersection to combine filters
Network Diameter
Density
Modularity
Page Rank
Connected Components
Layout allows you to visualize in different ways
Node Size according to data
Color according to Modularity
Direct Selection
Rectangle Selection
Drag Tool
Painter Tool
Node Pencil
Edge Pencil
Edit tool
Camera Button
Preview Tab
Select in Data Laboratory
Show in overview button
Tag cloud
Export filtered to new workspace

9.1
Location Based services
1. Foursquare
2. Yelp
3. Gowalla
4. Facebook
5. Twitter
Perceptions in OSM and Mobile Network
Mayorships is an incentive
pleaserobme.com looked at tweets and if user talks abouts a
location X, while belonging to another location, it means
they are travelling
People have designed cities based on data from foursqaure
badges and mayorship: gamifications of apps
Users can post tips, can serve as feedback
done or to-do

9.2
we are interested in done, to-do and mayorship
yahoo/geo/placefinder
few users have many mayorships and most have only one
few cities have many mayorships but most have very less
Some found correlation between number of mayorships and
number of tips and dones
New york is common in tips and dones
There are chances that there are mayorships but no tips
Dones are sparser
Lots of tips are generated 1hr apart
70% of the users have average distance of 150km
10% have 6000km
Frequency is 24hrs in most cases
Take all users, and classfiy

9.T.7
python-highcharts

chart = Highchart()
options = {
'chart': {
'type': "bar"
},
'title': {
'text': "Highchart bar"
},
'legend': {
'enabled': True
},
'xAxis': {
'categories': ['User 1', 'User 2']
},
'yAxis': {
'title': {
'text': "Number of Followers"
}
},

data1 = [123, 2323, 234, 34]

data2 = [238, 132, 1223, 3443]
chart.set_dict_options(options)

chart.add_data_set(data1, 'bar', name='day1')

chart.add_data_set(data2, 'bar', 'day2')

chart.save_file('/bar-chart.py')

10.1
Location based on other social networks
Twitter is the highest
0.5% users were geo tagged
reproducibility of the research

Sna Lab Report (21mic7199)
No ratings yet
Sna Lab Report (21mic7199)
25 pages
Privacy and Security in Online Social Media: Course On NPTEL NOC21-CS28
No ratings yet
Privacy and Security in Online Social Media: Course On NPTEL NOC21-CS28
30 pages
1745064423339-Coders of Delhi
No ratings yet
1745064423339-Coders of Delhi
12 pages
213j1a05h6 Data Science Cse-F
No ratings yet
213j1a05h6 Data Science Cse-F
25 pages
DM Lec1 2
No ratings yet
DM Lec1 2
39 pages
Data Preprocessing-AIML Algorithm1
No ratings yet
Data Preprocessing-AIML Algorithm1
47 pages
Ieee 2010 Titles: Data Alcott Systems (0) 9600095047
No ratings yet
Ieee 2010 Titles: Data Alcott Systems (0) 9600095047
6 pages
Data Science Lab Manual for Python
No ratings yet
Data Science Lab Manual for Python
75 pages
Privacy and Security Combine Assignment 2023
100% (1)
Privacy and Security Combine Assignment 2023
49 pages
Social Media Data Collection Techniques
No ratings yet
Social Media Data Collection Techniques
9 pages
Privacy & Security in Social Media
No ratings yet
Privacy & Security in Social Media
5 pages
Web Mining and Privacy: Bettina Berendt
No ratings yet
Web Mining and Privacy: Bettina Berendt
89 pages
Lec 10
No ratings yet
Lec 10
15 pages
Privacy and Security in Online Social Media JAN-2025
No ratings yet
Privacy and Security in Online Social Media JAN-2025
40 pages
Numpy I
No ratings yet
Numpy I
99 pages
BTP Research Internship Final Report
No ratings yet
BTP Research Internship Final Report
21 pages
Week 2 PSOSM - NPTEL
No ratings yet
Week 2 PSOSM - NPTEL
8 pages
Web Mining Practical File (NS)
No ratings yet
Web Mining Practical File (NS)
15 pages
Data Extraction Techniques on Facebook
No ratings yet
Data Extraction Techniques on Facebook
6 pages
Python Tools for Data Scientists
100% (1)
Python Tools for Data Scientists
23 pages
Feature Engineering - Introduction
No ratings yet
Feature Engineering - Introduction
74 pages
Assign Privacy and Security in Online Social Media - Unit 7 - Privacy and Pictures On Online Social Media
No ratings yet
Assign Privacy and Security in Online Social Media - Unit 7 - Privacy and Pictures On Online Social Media
8 pages
Data and Applications Security
No ratings yet
Data and Applications Security
50 pages
Detecting Fake Facebook Profiles with AI
No ratings yet
Detecting Fake Facebook Profiles with AI
10 pages
UCS 813: Social Networking Analysis Assignment: Submitted By: Navya Sagar 102103739
No ratings yet
UCS 813: Social Networking Analysis Assignment: Submitted By: Navya Sagar 102103739
5 pages
MTP Report
No ratings yet
MTP Report
42 pages
13 Ethics
No ratings yet
13 Ethics
75 pages
Multi-Party Privacy Risks in Social Media
No ratings yet
Multi-Party Privacy Risks in Social Media
39 pages
DMKD External Exam Answers
No ratings yet
DMKD External Exam Answers
12 pages
Big Data Programming & Analysis
No ratings yet
Big Data Programming & Analysis
12 pages
Lecture#1-Data Mining-MS (DEIM) - Spring 2025
No ratings yet
Lecture#1-Data Mining-MS (DEIM) - Spring 2025
33 pages
More On Numpy
No ratings yet
More On Numpy
50 pages
Web Analytics and Privacy
No ratings yet
Web Analytics and Privacy
21 pages
Data Science Lab Manual: Python Guide
No ratings yet
Data Science Lab Manual: Python Guide
72 pages
Module 1
No ratings yet
Module 1
91 pages
Efficient PageRank Computation Techniques
No ratings yet
Efficient PageRank Computation Techniques
19 pages
Toy Problems for Data Science Practice
No ratings yet
Toy Problems for Data Science Practice
5 pages
Data Collection for Researchers
No ratings yet
Data Collection for Researchers
44 pages
Lab Manual
No ratings yet
Lab Manual
80 pages
Privacy, Integrity, and Incentive Compatibility in Computations With Untrusted Parties
No ratings yet
Privacy, Integrity, and Incentive Compatibility in Computations With Untrusted Parties
45 pages
Fake News Detection
100% (1)
Fake News Detection
25 pages
Data Science Lab-KTU
No ratings yet
Data Science Lab-KTU
5 pages
Effective Speculation That Provides Security in Social Network
No ratings yet
Effective Speculation That Provides Security in Social Network
6 pages
Datamining Lect1
No ratings yet
Datamining Lect1
59 pages
SocialMediaLab R Package Tutorial
No ratings yet
SocialMediaLab R Package Tutorial
17 pages
Privacy Enhancing Technologies Overview
No ratings yet
Privacy Enhancing Technologies Overview
18 pages
COM3030 Week 10 Slides
No ratings yet
COM3030 Week 10 Slides
63 pages
EDA Techniques and R Implementation
No ratings yet
EDA Techniques and R Implementation
30 pages
ML Lab File
No ratings yet
ML Lab File
33 pages
Chapter 1+ Python Basics-1
No ratings yet
Chapter 1+ Python Basics-1
16 pages
Social Media Network Quiz Questions
No ratings yet
Social Media Network Quiz Questions
28 pages
Link Open Data Graph DBMS
No ratings yet
Link Open Data Graph DBMS
61 pages
Privacy in Online Social Networks: A Survey
No ratings yet
Privacy in Online Social Networks: A Survey
4 pages
SageMaker Algorithms Guide
No ratings yet
SageMaker Algorithms Guide
20 pages
Social Network Analysis with Python
No ratings yet
Social Network Analysis with Python
15 pages
Unit 3
No ratings yet
Unit 3
110 pages
ML Security, Privacy & Explainability
No ratings yet
ML Security, Privacy & Explainability
101 pages
S28A - KAVACH - 25.9.23 - 1695635841139.pdf#toolbar 0
100% (1)
S28A - KAVACH - 25.9.23 - 1695635841139.pdf#toolbar 0
45 pages
HW2 24
No ratings yet
HW2 24
8 pages
Azure 500
No ratings yet
Azure 500
3 pages
Steam Gen: Detection and Classification of Discontinuities Using Discrete Wavelet Transform and MFL Testing
No ratings yet
Steam Gen: Detection and Classification of Discontinuities Using Discrete Wavelet Transform and MFL Testing
10 pages
SOC 2 Compliance with strongDM
No ratings yet
SOC 2 Compliance with strongDM
5 pages
ANDROID
No ratings yet
ANDROID
54 pages
Query OverFlow
No ratings yet
Query OverFlow
39 pages
Modern Marketing Communication in Tourism
No ratings yet
Modern Marketing Communication in Tourism
5 pages
Intro to GIS for Students & Educators
No ratings yet
Intro to GIS for Students & Educators
29 pages
Linux Programming Lab - Ch04
No ratings yet
Linux Programming Lab - Ch04
12 pages
Technical - Manual - VX Rosenview
No ratings yet
Technical - Manual - VX Rosenview
23 pages
Folleto CashDro5 EN903 PDF
No ratings yet
Folleto CashDro5 EN903 PDF
2 pages
Manual de Mesas para Juegos de Mesa
No ratings yet
Manual de Mesas para Juegos de Mesa
11 pages
Iris Segmentation Methodology For Non-Cooperative Recognition
No ratings yet
Iris Segmentation Methodology For Non-Cooperative Recognition
7 pages
Course Structure BArch 2017-22 PDF
No ratings yet
Course Structure BArch 2017-22 PDF
106 pages
Hating Her Ex Forever Brynn Paulin Digital Version 2025
No ratings yet
Hating Her Ex Forever Brynn Paulin Digital Version 2025
31 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
3 pages
VLT AutomationDrive FC 301 302 DG M00190 01
No ratings yet
VLT AutomationDrive FC 301 302 DG M00190 01
264 pages
The Book of Next Gen Networks
No ratings yet
The Book of Next Gen Networks
164 pages
5kp30a 26787697
No ratings yet
5kp30a 26787697
5 pages
Summer Internship TCL
No ratings yet
Summer Internship TCL
21 pages
SAP SD Enterprise Structure Exercise
100% (1)
SAP SD Enterprise Structure Exercise
18 pages
MCA-128 (DM) - Supplementary Exam
No ratings yet
MCA-128 (DM) - Supplementary Exam
2 pages
X Flowww
No ratings yet
X Flowww
3 pages
651a36d211b4ed7aad6955a7 46184252463
No ratings yet
651a36d211b4ed7aad6955a7 46184252463
2 pages
Assignment II COAL-Ff PDF
No ratings yet
Assignment II COAL-Ff PDF
4 pages
Advanced Topics: Harmonics Part 2
No ratings yet
Advanced Topics: Harmonics Part 2
7 pages
INF2603 201 2023 Assignment Questions
No ratings yet
INF2603 201 2023 Assignment Questions
5 pages
Retro Gamer 003 - P99-P100
No ratings yet
Retro Gamer 003 - P99-P100
2 pages
Transistor as a Switch Experiment
100% (1)
Transistor as a Switch Experiment
5 pages