0% found this document useful (0 votes)
32 views16 pages

Encoding Socail Media Algorithm

The document discusses the encoding algorithms used by social media platforms Instagram and Twitter to manage large volumes of data, including text, images, and videos. It outlines various encoding techniques such as UTF-8 for text, JPEG and WebP for images, and H.264 for videos, as well as the role of machine learning in personalizing content. Additionally, it highlights the importance of security measures like encryption and hashing in protecting user data.

Uploaded by

ganganikasak20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views16 pages

Encoding Socail Media Algorithm

The document discusses the encoding algorithms used by social media platforms Instagram and Twitter to manage large volumes of data, including text, images, and videos. It outlines various encoding techniques such as UTF-8 for text, JPEG and WebP for images, and H.264 for videos, as well as the role of machine learning in personalizing content. Additionally, it highlights the importance of security measures like encryption and hashing in protecting user data.

Uploaded by

ganganikasak20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

1.

Abstract
Abstract: Encoding Algorithms Behind Social Media
Platforms (Instagram & Twitter)
Social media platforms like Instagram and Twitter handle vast volumes of data in the form of
text, images, videos, and metadata. To ensure efficient storage, fast transmission, and
optimal user experience, these platforms leverage a variety of encoding algorithms, including
compression, encryption, and machine learning-based encoding.

1. Text Encoding and Compression

 Unicode (UTF-8/UTF-16) is the dominant standard for encoding textual content (posts,
tweets, comments).
 Tokenization and Embeddings: For search, recommendation, and NLP tasks (like
content moderation or sentiment analysis), text is encoded using:
o Byte Pair Encoding (BPE) or WordPiece (used in transformers like BERT).
o Transformer-based models encode text into high-dimensional vectors
(embeddings).
 Compression:
o Standard lossless algorithms like Gzip, Brotli, or Zstandard are applied before
storage or transmission, especially in APIs.

2. Image Encoding (Instagram-centric)

 Instagram optimizes for visual content, using:


o JPEG (lossy, widely supported for photos)
o WebP (more efficient compression than JPEG; used for faster loading)
 Images are resized and compressed using:
o Adaptive quality encoding based on:
 Screen size
 Network bandwidth
 Device type
 Instagram may also use progressive JPEG for faster perceptual loading (image loads in
layers).

3. Video Encoding

 Platforms like Instagram use H.264 (AVC) and H.265 (HEVC) codecs for video
compression.
o Bitrate adaptation via MPEG-DASH or HLS (HTTP Live Streaming) ensures
smooth playback on different networks.
 Instagram Reels and Twitter videos use dynamic resolution switching based on user
connectivity.
 Audio is typically encoded using AAC (Advanced Audio Coding).

4. Encryption & Security Encoding

 All media and text content are end-to-end encrypted in transit using:
o HTTPS/TLS
o For internal storage, sensitive data may be encoded using AES-256 encryption.
 User credentials and session tokens are stored using hashing algorithms like bcrypt or
scrypt.

5. Metadata & Protocol Encoding

 Data exchanged via APIs is usually serialized using:


o JSON or Protocol Buffers (Protobuf) for efficient network communication.
 Metadata such as geolocation, timestamps, and user behavior is encoded for analytics and
personalization.
 Twitter uses Tweet Snowflake IDs, which are time-sortable 64-bit encoded IDs that
encode timestamp, machine ID, and sequence number.

6. Machine Learning Encoding (Recommendation Systems)

 Content encoding for ranking feeds and ads:


o Instagram and Twitter use deep learning models that encode user interactions
and content features into embedding vectors.
o These embeddings are input to neural ranking models (e.g., DLRMs,
transformers).
 Encoding is crucial for:
o Feed ranking
o Trending topic detection
o Ad targeting
o Spam detection
[Link]
Introduction to Instagram and Twitter
🔹 Instagram

Launched in 2010 and later acquired by Meta (formerly Facebook) in 2012, Instagram is a
photo and video-sharing social networking service that has become one of the most widely used
platforms globally. Designed primarily for visual content, Instagram allows users to upload
images and videos, apply filters, share stories, and interact through likes, comments, and direct
messages.

Over the years, Instagram has expanded its features to include Reels, IGTV, Live videos, and
shopping integrations, making it a complex platform that handles high-resolution multimedia
content alongside personalized feeds powered by machine learning algorithms. With over 2
billion monthly active users, Instagram processes massive volumes of media and user data,
making efficient encoding and algorithmic optimization essential for speed, storage, and
personalization.

In the context of this seminar, Instagram serves as a prime example of visual data encoding and
machine learning-driven feed curation.

🔹 Twitter (now X)

Twitter, launched in 2006 and rebranded to X in 2023 under Elon Musk's ownership, is a
platform known for its short-form content, originally limited to 140 characters and later
expanded to 280 characters and beyond. It has become a global hub for real-time
communication, news sharing, and public discourse.

Unlike Instagram’s focus on images and videos, Twitter is primarily text-centric, though it also
supports multimedia content such as images, GIFs, and videos. With hundreds of millions of
daily active users, Twitter's platform depends on fast and lightweight data encoding, real-time
trend analysis, and algorithmic ranking of tweets. Its unique architecture includes the use of
Snowflake IDs for scalable data tracking and embedding-based algorithms for user
recommendations and content filtering.

For this seminar, Twitter demonstrates how textual and behavioral data are encoded and
processed to enable real-time interaction and personalized user experiences.
🔸 Relevance to Encoding Algorithms

Both platforms rely on advanced encoding techniques to handle:

 Text and media content efficiently (compression, formatting)


 User interaction data (likes, comments, retweets, shares)
 Machine learning models that recommend content and detect spam
 Secure transmission and storage of user-generated content

Thus, Instagram and Twitter serve as excellent case studies for understanding how encoding
strategies power modern social media algorithms.
[Link]
1. To Understand the Role of Encoding in Social Media Platforms

Encoding is a fundamental process that enables social media platforms to handle various types of
data—text, images, videos, and user interactions. This seminar aims to help participants
understand how encoding helps in:

 Structuring raw input data (like tweets or images)


 Transforming it into machine-readable formats
 Supporting fast processing, transmission, and storage

For example, Instagram uses JPEG encoding for images, while Twitter uses UTF-8 encoding for
text tweets.

🔹 2. To Analyze the Different Types of Encoding Techniques Used by


Instagram and Twitter

Both Instagram and Twitter deal with different types of content. This objective focuses on
analyzing:

 Text encoding: UTF-8, Byte Pair Encoding (BPE), WordPiece, etc.


 Image encoding: JPEG, WebP (used by Instagram for efficient media loading)
 Video encoding: H.264, HEVC (used for Reels, Stories, Twitter videos)
 Behavioral encoding: Converting user interactions into data models used for feed
ranking and recommendation

Understanding these techniques helps reveal how content is compressed, transmitted, and
analyzed in real time.

🔹 3. To Study How Machine Learning Uses Encoded Data to Personalize


Content

Social media platforms don’t just show content randomly—they use machine learning models
that depend on encoded data. This objective explores:

 How user data (likes, follows, watch time) is encoded into embedding vectors
 How these embeddings feed into recommendation algorithms
 How encoded content features (e.g., hashtags, visual patterns) are used to personalize the
feed
Instagram’s Explore Page and Twitter’s “For You” timeline are prime examples of algorithmic
content personalization using encoded data.

🔹 4. To Examine How Encoding Improves Performance and Efficiency

Efficient encoding ensures that social media apps:

 Load faster even on slow networks


 Store more data with less space
 Handle millions of simultaneous users

This objective focuses on studying how:

 Compression algorithms reduce file size (e.g., WebP vs. JPEG)


 Data serialization (e.g., JSON, Protobuf) reduces network latency
 Streaming protocols (e.g., HLS, MPEG-DASH) support adaptive video playback

🔹 5. To Understand the Security Aspects of Encoding (Encryption & Data


Protection)

Encoding also plays a key role in data privacy and protection. This objective includes:

 Studying how data is encrypted using protocols like TLS (Transport Layer Security)
 Understanding hashing techniques like bcrypt for securing login information
 Exploring how platforms securely encode and transmit personal user data

🔹 6. To Compare and Contrast Encoding Approaches Between Instagram and


Twitter

Even though both platforms serve social content, their encoding needs differ:

 Instagram is media-heavy, so image and video encoding is prioritized.


 Twitter is text-heavy, so NLP-based text encoding and tweet ranking are more
prominent.

This objective involves a side-by-side comparison of:

 Encoding standards used


 Data flow architecture
 ML models and recommendation systems
🔹 7. To Explore the Impact of Encoding on Scalability and Real-Time Data
Processing

Platforms like Twitter handle real-time trending topics and live tweets, while Instagram
processes millions of story views per second. This objective focuses on:

 How encoded data supports horizontal scaling


 How platforms process and respond to user data in real time
 Technologies like Twitter’s Snowflake ID used for efficient time-based encoding of
events
[Link] of Litrature.
Literature Review on Encoding Algorithms in Instagram & Twitter
What Was Found / Relevance to Instagram &
Topic Author(s) Year
Proposed Twitter Encoding

Inspired Twitter to use


Introduced the
embedding-based
1. Text Encoding Transformer model using
Vaswani et al. 2017 representations for tweet
and NLP attention mechanisms for
ranking, sentiment analysis, and
efficient text encoding.
filtering.

Analyzed how BPE and


Helps Twitter/Instagram break
WordPiece tokenization
Ghosh et al. 2020 down captions or tweets for NLP
segment text for better
tasks, improving model accuracy.
processing.

Adaptive image
Instagram uses JPEG and
2. Image and compression improves
Wang et al. 2019 adaptive compression for fast
Video Encoding speed while preserving
image delivery.
quality.

Described adaptive bitrate


Used by Instagram for smooth
streaming (HLS, MPEG-
Li et al. 2021 video playback across network
DASH) for video
types.
performance.

Introduced DLRM for feed


3. ML-Based Feed Applied by Instagram to rank
Facebook AI ranking using
Ranking 2019 posts based on user behavior
Research dense/sparse feature
Algorithms and content signals.
encoding.

Published ranking models Core to how Twitter orders


2023
Twitter using tweet/user Home feed, using neural
(public
Engineering embeddings for timeline networks over encoded
release)
ranking. interaction data.

Discussed encoding in
Twitter and Instagram secure
4. Data Encoding Feldman et security: TLS for transport,
2015 data transmission and storage
& Privacy al. bcrypt for password
using these methods.
hashing.

Created Snowflake IDs – Allows Twitter to assign unique,


Twitter
2010 scalable, 64-bit encoded time-ordered IDs to tweets
Engineering
identifiers. efficiently.
What Was Found / Relevance to Instagram &
Topic Author(s) Year
Proposed Twitter Encoding

5. Content Used progressive JPEGs Instagram optimizes image


Instagram
Delivery Various and WebP formats to delivery through encoded
Engineering
Optimization reduce image load time. formats.

Twitter Uses Protobuf encoding Enables faster, lighter data


Various
Engineering to compress API data. transfer for mobile/web apps.

Encoding used in real- Both platforms encode user


6. Real-Time time analytics to track interaction streams to detect and
Zhao et al. 2021
Data Encoding trends and moderate respond to trending topics or
content. violations.
5. Tools / Technology / Methodology for ML-Based
Feed Ranking Algorithms

A. General Workflow of Ranking with Machine Learning


1. Candidate Generation
 From millions (sometimes billions) of possible posts, generate a smaller set of
candidates (think: 10,000 posts down to ~500).
 Methods include collaborative filtering, approximate nearest neighbor search, or
user–creator graph sampling.
 Example: “What are the 500 most likely posts this user might interact with?”
2. Scoring → Machine Learning Models
 Assign a relevance score predicted by a trained model.
 Model input = features (~hundreds to thousands of them).
 Model output = probability prediction such as P(user likes post), P(user shares
post), P(user watches > 10s).
3. Ranking
 Combine multiple scores into a single overall ranking score.
 Posts are then sorted descending → top posts go to the feed.
4. Post-ranking adjustments
 Apply business rules: diversity (avoid 20 posts from the same person), freshness
boost, demotion of reported/spammy content, policy filters.

B. Algorithms Commonly Used in Feed Ranking


Here’s where it gets technical — the exact classes of machine learning models developers
actually use:
1. Logistic Regression (Early Stage)
 Use: Predict simple binary outcomes like “Will user like this post (yes/no)?”
 Pros: Fast, interpretable.
 Cons: Linear, weak expressive power.
 Historical fact: Facebook’s early News Feed and Twitter’s early “Who to Follow”
recommendations started with logistic regression.

2. Gradient Boosted Decision Trees (GBDT)


 Famous libraries: XGBoost, LightGBM, CatBoost.
 Use: Predict complex engagement probabilities, handling both continuous and
categorical features.
 Pros: Powerful for tabular data (rich user + post features).
 Cons: Harder to scale to online, real-time updates compared to neural nets.
 Example real-world: LinkedIn originally used GBDTs for ranking job posts and has
published papers on this.

3. Neural Networks (Deep Learning)


 Use: Capture non-linear relationships → “If user watches videos of type X and always
saves posts from creator Y, then boost similar ones.”
 Architecture: Usually Multi-Layer Perceptrons (MLPs) with feature embeddings.
 Examples:
 Instagram Reels/TikTok: Optimize for watch-time → use deep neural networks
predicting user retention curve.
 YouTube Deep Neural Recommendation: Two-stage model: candidate
generation using a deep retrieval network, and ranking via an MLP that predicts
engagement probabilities.

4. Learning-to-Rank (L2R) Systems


These are specific algorithms designed for ranking rather than classification:
 Pointwise models: Predict probabilities individually for each post (e.g., logistic
regression/GBDT/neural nets per item).
 Pairwise models: Train models to decide “between post A and post B, which should
rank higher?” (e.g., RankNet by Microsoft).
 Listwise models: Consider an entire ranked list during training, optimizing ranking
metrics directly (e.g., LambdaRank, LambdaMART).
🔧 Platforms like Twitter/X and TikTok use listwise learning to rank + deep models because they
directly improve feed quality.

5. Reinforcement Learning (RL) — Experimental / Advanced


 Framing feed ranking as: “Which content sequence maximizes long-term satisfaction
(not just immediate clicks)?”
 Uses bandit algorithms and reinforcement learning to balance exploration (show
new/unproven content) vs. exploitation (show what’s known to work).
 Example: TikTok research has hinted at RL-style approaches for continuously optimizing
watch sessions.

C. Features Used for Training (Inputs to Models)


The magic of ML ranking comes from using huge feature sets. Twitter revealed ~1,500 features
are used. Examples:
 User features: scrolling speed, device type, session length, past engagement.
 Post features: type (video/text), topic/hashtags, virality.
 User–post interaction features: whether you followed post’s author, past interactions
with them.
 Context features: time of day, location, trending context.
All features are encoded as vectors (embeddings) so models can process them.

D. Example of ML Ranking Pipeline


(simplified)
1. Input: User = @JohnDoe. Candidates = 500 posts from network + trending.
2. Model (Neural Net):
 Inputs: [“user embedding”, “post embedding”, “engagement history, recency
factor, etc.”]
 Output: P(like)=0.6, P(comment)=0.2, P(share)=0.1.
3.
4. Weighted score: 0.5×Like + 0.3×Comment + 0.2×Share = 0.39.
5. Compare scores → produce ranked feed.

E. Tools & Technology Stack Used by Developers


 Offline training: TensorFlow, PyTorch (for neural nets), XGBoost/LightGBM (for GBDTs).
 Feature engineering: Spark, Hadoop, Dataflow for processing large datasets.
 Real-time serving:
 Feature stores (Feast, internal systems).
 Recommendation pipelines (Caffe2 at Meta, TwML at Twitter, TensorFlow
Serving at Google).
 Real-time stream ingestion (Kafka).
 Optimization metric: Companies often optimize for predicted probability of
engagement, but increasingly also include long-term satisfaction surveys and content
quality metrics.
6. Conclusions
1. Algorithms as Ranking Systems
 At their core, social media algorithms are nothing mystical: they are highly
optimized ranking systems.
 The essential task is: collect possible posts → evaluate them with a scoring
function (based on many signals) → sort → present to user.
 This ranking balances an abundance of signals such as recency, popularity,
relationship strength, content type, and predicted engagement.
2. Transparency via Simplified Encoding
 By “encoding” a toy version of the algorithm (e.g., scoring posts with recency,
likes, comments, and relationship strength), we reveal how ranking choices
shape what users see.
 Such simplifications provide a transparent educational model, exposing the logic
behind black-box systems that ordinarily run invisibly.
3. Scalability to Industry Systems
 Real platforms (Instagram, Twitter/X, TikTok, YouTube) use the same conceptual
foundation but on an industrial scale.
 Instead of 3–4 features (recency, likes), companies use hundreds or thousands
of features: device type, session length, watch-time detail, follow graph
strength, embeddings of user and content.
 Algorithms have evolved from simple logistic regression to gradient boosted
decision trees, deep neural networks, and learning-to-rank systems.
 The pipelines ingest billions of events daily and must produce personalized feeds
in real time (millisecond latency).
4. Algorithms Encode Values, Not Neutrality
 Algorithms are not objective referees; they reflect platform priorities:
engagement, retention, and ad revenue.
 For instance:
 Instagram gives higher weight to saves (because they suggest long-term
interest).
 TikTok optimizes for watch-time (because videos keep users scrolling).
 Twitter emphasizes recency + engagement (to promote fresh activity).
 Thus, encoded priorities affect what dominates user feeds: novelty, virality,
outrage, or entertainment — all rooted in particular values chosen by
businesses.
5. Necessity of Critical Understanding
 Society relies on these feeds for news, politics, education, and even health
awareness.
 Therefore, public understanding and open discussion of algorithms is necessary
for:
 Innovation → designing better, more user-centric ranking methods.
 Ethics → ensuring fairness, diversity, reducing misinformation, and
prioritizing user well-being.
 Responsibility → making companies accountable for their algorithmic
choices.
 Transparency, thoughtful scrutiny, and responsible design are crucial.
7. References
1. Kramer, A. D. I., Guillory, J. E., & Hancock, J. T. (2014).
“Experimental evidence of massive-scale emotional contagion through social
networks.” Proceedings of the National Academy of Sciences (PNAS), 111(24), 8788–
8790.
 This controversial Facebook study demonstrated how small changes in News
Feed ranking could influence the emotional tone of user’s posts, showing the
powerful psychological effects of algorithms.
2. Gillespie, T. (2018).
Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions
That Shape Social Media. Yale University Press.
 Analyzes how algorithms and content moderation serve as invisible forces
shaping culture, raising questions about power, responsibility, and transparency
in platform governance.
3. Pariser, E. (2011).
The Filter Bubble: What the Internet Is Hiding from You. Penguin Press.
 Early, influential critique that argued personalization algorithms risk creating
echo chambers, limiting diversity of information and reinforcing biases.
4. Twitter Engineering (2023).
“A Closer Look at Our Recommendation Algorithm.” Official Twitter Engineering Blog.
 Twitter published parts of its feed algorithm, revealing use of 1,500 ranking
features and a multi-stage recommendation pipeline (candidate generation,
scoring, post-ranking). Provides insight into how large-scale ML ranking operates
in industry.
5. Meta (Instagram) Documentation (2021–2023).
“How Ranking Works Across Instagram.” Meta Help Center.
 Explains how Instagram ranks Feeds, Stories, Reels, and Explore using slightly
different weighting systems. Highlights role of signals like saves, shares, and
watch-time in determining visibility.

You might also like