Social media analysis using
Graph database
By
G. Harsha Vardhan Reddy - RA2311026050219
P. Dileep kumar reddy - RA2311026050250
K. Sandeep - RA2311026050116
Introduction
Social media analytics is the process of
collecting, analyzing, and interpreting data
from social media platforms to support business
decisions, understand audience behavior, and
measure performance. As billions of users
engage across platforms like Facebook,
Instagram, X (formerly Twitter), LinkedIn, and
TikTok, they generate vast amounts of data—
likes, shares, comments, mentions, and more—
that can reveal valuable insights.
Problem statement :
Social media platforms generate an immense
volume of dynamic, interconnected data—users
engaging with content, forming communities,
sharing opinions, and in uencing trends.
Analyzing these interactions is critical for
understanding user behavior, detecting
in uential nodes, tracking viral content, and
uncovering hidden communities.
fl
fl
However, traditional relational databases are
inef cient in handling the complex and highly
connected nature of social media data.
Performing queries like “Who are the top
in uencers in a topic-based subnetwork?” or
“Which communities are spreading
misinformation?” becomes computationally
expensive and dif cult to scale.
There is a need for a more exible and
ef cient data model that can naturally
represent and analyze relationships between
users, posts, hashtags, and interactions.
This project addresses the challenge by utilizing
graph databases—such as Neo4j—to:
• Represent social media data as a graph of
nodes (users, posts, hashtags) and edges
(likes, follows, mentions)
• Perform deep network analysis (e.g.,
centrality, clustering, shortest paths)
• Enable scalable, real-time querying of social
relationships and engagement patterns
fi
fl
fi
fi
fl
• Support advanced analytics like in uencer
detection, trend mapping, and community
analysis
Needs of social media analysis
. Increase customer acquisition
. Protect brand health
. Lower customer care costs
. Maximize campaign performance
. Boost campaign performance
. Improve crisis management
fl
Scope of the project
• Graph-Based Modeling: Social media data
(users, posts, hashtags, likes, comments,
etc.) will be modeled using nodes and
relationships in a graph database like Neo4j.
• Relationship Analysis: Analyze user
connections, content interactions, and
in uence networks through graph
algorithms.
• In uencer Detection: Identify key users
based on network centrality measures (e.g.,
PageRank, betweenness centrality).
• Community Detection: Uncover groups or
clusters of users interacting around common
topics or hashtags.
• Trend and Topic Mapping: Trace how
speci c content or topics propagate through
the network.
fl
fl
fi
• Visualization: Graph-based visual
representations of user relationships and
data ow to enhance understanding.
Prerequisite
. Language - python
. Libraries - py2neo, pandas
. Database - Neo4j
fl
System Architecture
1. Social Media
Source of raw data.
This refers to platforms like Twitter, Facebook, Reddit, Instagram, etc.,
from which user-generated content (posts, tweets, reviews, etc.) is
collected.
2. Data Collection
Purpose: To gather raw data from social media sources.
Tools: APIs (e.g., Twitter API), web scraping tools (BeautifulSoup,
Scrapy).
Output: Unstructured text data including posts, comments, hashtags,
user info, etc.
3. Data for Analysis
Purpose: Prepares collected data for processing.
Activities: Data formatting, cleaning (removing URLs, tags), language
detection, etc.
Output: Structured or semi-structured data ready for ltering.
4. Noise Filter
Purpose: To remove irrelevant or unimportant data.
Methodologies:
Keyword ltering
Language models
Rule-based or ML-based classi ers
Output:
Relevant Reviews: Posts that are useful for further analysis (e.g.,
product reviews, opinions).
Irrelevant Reviews: Spam, advertisements, unrelated text, etc.
5. Sentiment & Emotion Analysis
Purpose: To detect the mood or tone of the text (positive, negative,
neutral) and emotions (joy, anger, fear, etc.).
fi
fi
fi
Techniques:
Sentiment analysis using NLP (VADER, TextBlob, BERT).
Emotion classi cation using models trained on emotional datasets
(NRC, DeepMoji).
Output: Each post is tagged with sentiment and possibly emotion
categories.
6. Predictive Analysis
Purpose: To extract trends, make forecasts, or derive insights from the
sentiment/emotion-labeled data.
Use Cases:
Predicting public opinion over time.
Forecasting user engagement or virality.
Identifying emerging issues or interests.
Techniques: Machine Learning models (SVM, Random Forest, LSTM),
time series analysis, clustering.
7. Result Views
Purpose: To present the analysis results in a user-friendly manner.
Tools:
Dashboards (using Streamlit, Dash, or Power BI).
Graph visualizations, charts (pie, bar, line).
fi
Implementation steps
pip install py2neo networkx matplotlib
from py2neo import Graph
# Connect to the Neo4j database
graph = Graph("bolt://localhost:7687", auth=(“neo4j”, "password"))
from py2neo import Node, Relationship
# Create user nodes
user1 = Node("User", name="Alice")
user2 = Node("User", name="Bob")
user3 = Node("User", name="Carol")
# Create post nodes
post1 = Node("Post", content="Exploring graph databases!")
post2 = Node("Post", content="Loving Python and Neo4j!")
# Create relationships
graph.create(user1 | user2 | user3 | post1 | post2)
graph.create(Relationship(user1, "FRIEND", user2))
graph.create(Relationship(user2, "FRIEND", user3))
graph.create(Relationship(user1, "POSTED", post1))
graph.create(Relationship(user2, "POSTED", post2))
# Find all friends of Alice
friends_of_alice = graph.run("MATCH (a:User {name: 'Alice'})-[:FRIEND]-
>(f) RETURN f.name AS friend").data()
print("Friends of Alice:", [friend['friend'] for friend in friends_of_alice])
# Find posts made by Bob
posts_by_bob = graph.run("MATCH (b:User {name: 'Bob'})-[:POSTED]-
>(p) RETURN p.content AS post").data()
print("Posts by Bob:", [post['post'] for post in posts_by_bob])
import networkx as nx
import matplotlib.pyplot as plt
# Create a NetworkX graph from Neo4j data
G = nx.Graph()
# Add nodes and edges to the NetworkX graph
for node in graph.nodes.match("User"):
G.add_node(node["name"])
for rel in graph.relationships.match():
G.add_edge(rel.start_node["name"], rel.end_node["name"])
# Draw the network
plt. gure( gsize=(8, 6))
nx.draw(G, with_labels=True, node_size=3000, node_color="skyblue",
font_size=10, font_weight="bold")
plt.title("Social Media Network")
plt.show()
fi
fi
Output
Future scope
• Mapping in uence chains (who in uences whom)
• Detecting communities or cliques
• Identifying key opinion leaders (KOLs) and brand
ambassadors
• Tracing information spread (virality, fake news
propagation)
fl
fl