0% found this document useful (0 votes)
19 views50 pages

Unit 1 Final

The document provides an introduction to recommender systems, detailing their purpose, types, and underlying principles. It explains the processes involved in generating personalized recommendations, including data collection, processing, and algorithm selection, while also discussing various types of recommender systems such as content-based, collaborative filtering, and hybrid systems. Additionally, it highlights the goals and examples of successful recommender systems in various industries, emphasizing their role in enhancing user experiences.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views50 pages

Unit 1 Final

The document provides an introduction to recommender systems, detailing their purpose, types, and underlying principles. It explains the processes involved in generating personalized recommendations, including data collection, processing, and algorithm selection, while also discussing various types of recommender systems such as content-based, collaborative filtering, and hybrid systems. Additionally, it highlights the goals and examples of successful recommender systems in various industries, emphasizing their role in enhancing user experiences.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

CCS360 – RECOMMENDER SYSTEMS

UNIT-1: INTRODUCTION

UNIT I INTRODUCTION 6
Introduction and basic taxonomy of recommender systems - Traditional and non-personalized
Recommender Systems - Overview of data mining methods for recommender systems-
similarity measures- Dimensionality reduction – Singular Value Decomposition (SVD)
Suggested Activities:
• Practical learning – Implement Data similarity measures.
• External Learning – Singular Value Decomposition (SVD) applications
Suggested Evaluation Methods:
• Quiz on Recommender systems.
• Quiz of python tools available for implementing Recommender systems
INTRODUCTION:
Recommender systems, also known as recommendation systems or engines, are a
type of software application designed to provide personalized suggestions or
recommendations to users. These systems are widely used in various online platforms and
services to help users discover items or content of interest. Recommender systems
leverage data about users' preferences, behaviors, and interactions to generate accurate
and relevant recommendations.

WHAT ARE RECOMMENDER SYSTEMS?


Recommender systems are sophisticated algorithms designed to provide product-
relevant suggestions to users. Recommender systems play a paramount role in enhancing
user experiences on various online platforms, including e-commerce websites, streaming
services, and social media.
Essentially, recommender systems aim to analyze user data and behavior to make tailored
recommendations.
• Data collection: Recommender systems start by gathering data on user interactions,
preferences, and behaviors. This data can include past purchases, browsing history,
ratings, and social connections.
• Data processing: Once collected, they process the data to extract meaningful patterns
and insights. This involves techniques like data cleaning, transformation, and feature
engineering.
• Algorithm selection: Depending on the specific platform and its data, a specific
recommender algorithm is applied to generate recommendations. Common types
include collaborative filtering, content-based filtering, and hybrid methods.
• User profiling: Using historical data, recommender systems create user profiles. These
represent their preferences, interests, and behavior, allowing the system to understand
individual tastes.
• Item profiling: Similarly, items or content available on the platform are also profiled
based on their characteristics. Think of attributes like genres, keywords, or product
features.
• Recommendation generation: The next step involves algorithms matching user
profiles with item profiles. For example, collaborative filtering identifies users with
similar preferences and recommends items liked by others with similar profiles.
Content-based filtering recommends items based on the attributes of items users have
previously interacted with.

1
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

• Ranking and presentation: Finally, the recommended items are ranked based on their
relevance to the user. The top-ranked items are then presented to the user through
interfaces like recommendation lists, personalized emails, or pop-up suggestions.

what is the basic principle that underlies the working of recommendation


algorithms?
The basic principle of recommendations is that significant dependencies exist
between user and item-centric activity. For example, a user who is interested in a
historical documentary is more likely to be interested in another historical documentary
or an educational program, rather than in an action movie. In many cases, various
categories of items may show significant correlations, which can be leveraged to make
more accurate recommendations.
Alternatively, the dependencies may be present at the finer granularity of
individual items rather than categories. These dependencies can be learned in a data-
driven manner from the ratings matrix, and the resulting model is used to make predictions
for target users.
The larger the number of rated items that are available for a user, the easier it is
to make robust predictions about the future behavior of the user. Many different learning
models can be used to accomplish this task. For example, the collective buying or rating
behavior of various users can be leveraged to create partners of similar users that are
interested in similar products. The interests and actions of these cohorts can be leveraged
to make recommendations to individual members of these cohorts.
The above-mentioned description is based on a very simple family of
recommendation algorithms, referred to as neighborhood models. This family belongs to
a broader class of models, referred to as collaborative filtering. The term “collaborative
filtering” refers to the use of ratings from multiple users in a collaborative way to predict
missing ratings. In practice, recommender systems can be more complex and data-rich,
with a wide variety of auxiliary data types. For example, in content-based recommender
systems, the content plays a primary role in the recommendation process, in which the
ratings of users and the attribute descriptions of items are leveraged in order to make
predictions. The basic idea is that user interests can be modeled on the basis of properties
(or attributes) of the items they have rated or accessed in the past. A different framework
is that of knowledge-based systems, in which users interactively specify their interests,
and the user specification is combined with domain knowledge to provide
recommendations. In advanced models, contextual data, such as temporal information,
external knowledge, location information, social information, or network information,
may be used.

There are several types of recommender systems, each with its own approach
to generating recommendations. The basic taxonomy of recommender systems
includes:

2
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

a. Content-Based Recommender Systems:


• Overview: Content-based systems recommend items based on the features of the items
themselves and the preferences expressed by the user.
• Key Components: The system analyzes the content of items and creates user profiles
based on the features of items the user has liked or interacted with in the past.

b. Collaborative Filtering Recommender Systems:

• Overview: Collaborative filtering relies on user-item interactions and


recommendations from other users with similar preferences to make predictions for
a target user.
• Types:
o User-Based Collaborative Filtering: Recommends items based on the
preferences of users who are similar to the target user.
o Item-Based Collaborative Filtering: Recommends items that are
similar to those the user has liked or interacted with in the past.

c. Hybrid Recommender Systems:


• Overview: Hybrid systems combine multiple recommendation techniques to
overcome the limitations of individual methods, providing more accurate and diverse
recommendations.
• Types:
o Weighted Hybrid: Assigns different weights to recommendations from
different methods and combines them.
o Switching Hybrid: Switches between different recommendation methods based on
certain conditions or user interactions.
There are some more types of recommender systems. They are:

3
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

i. Matrix Factorization Recommender Systems:


• Overview: Matrix factorization models decompose the user-item interaction matrix
into latent factors, allowing the system to make predictions based on these factors.

• Example: Singular Value Decomposition (SVD) and Alternating Least Squares


(ALS) are common matrix factorization techniques used in recommender systems.

ii. Context-Aware Recommender Systems:


• Overview: Context-aware systems take into account additional contextual
information, such as time, location, or user activity, to enhance the relevance of
recommendations.

• Example: Recommending different movies based on the time of day or suggesting


nearby restaurants based on a user's location.

iii. Knowledge-Based Recommender Systems:


• Overview: Knowledge-based systems recommend items by taking into account
explicit knowledge about user preferences, requirements, and item characteristics.

• Example: Recommending educational courses based on a user's career goals and


academic background.

iv. Deep Learning-Based Recommender Systems:

• Overview: Deep learning techniques, such as neural networks, are employed to model
complex patterns and dependencies in user-item interactions for more accurate
recommendations.

• Example: Neural collaborative filtering models that use embeddings to represent


users and items.

GOALS OF RECOMMENDER SYSTEMS


The two primary models are as follows:
1. Prediction version of problem: The first approach is to predict the rating value for a
user-item combination. It is assumed that training data is available, indicating user
preferences for items. For m users and n items, this corresponds to an incomplete 𝑚 × 𝑛
matrix, where the specified (or observed) values are used for training. The missing (or
unobserved) values are predicted using this training model. This problem is also referred
to as the matrix completion problem because we have an incompletely specified matrix
of values, and the remaining values are predicted by the learning algorithm.
2. Ranking version of problem: In practice, it is not necessary to predict the ratings of
users for specific items in order to make recommendations to users. Rather, a merchant

4
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

may wish to recommend the top-k items for a particular user, or determine the top-k
users to target for a particular item. The determination of the top-k items is more
common than the determination of top-k users, although the methods in the two cases
are exactly analogous.

Increasing product sales is the primary goal of a recommender system. Recommender


systems are, after all, utilized by merchants to increase their profit. By recommending
carefully selected items to users, recommender systems bring relevant items to the
attention of users. This increases the sales volume and profits for the merchant.
In order to achieve the broader business-centric goal of increasing revenue, the
common operational and technical goals of recommender systems are as follows:
• Relevance: The most obvious operational goal of a recommender system is to
recommend items that are relevant to the user at hand. Users are more likely to
consume items they find interesting. Although relevance is the primary operational
goal of a recommender system, it is not sufficient in isolation. Therefore, we discuss
several secondary goals below, which are not quite as important as relevance but are
nevertheless important enough to have a significant impact
• Novelty: Recommender systems are truly helpful when the recommended item is
something that the user has not seen in the past. For example, popular movies of a
preferred genre would rarely be novel to the user. Repeated recommendation of
popular items can also lead to reduction in sales diversity.
• Serendipity: A related notion is that of fate, wherein the items recommended are
somewhat unexpected, and therefore there is a modest element of lucky discovery,
as opposed to obvious recommendations. Serendipity is different from novelty in
that the recommendations are truly surprising to the user, rather than simply
something they did not know about before. It may often be the case that a particular
user may only be consuming items of a specific type, although a latent interest in
items of other types may exist which the user might themselves find surprising.
Unlike novelty, serendipitous methods focus on discovering such recommendations.
• Increasing recommendation diversity: Recommender systems typically suggest a
list of top-k items. When all these recommended items are very similar, it increases
the risk that the user might not like any of these items. On the other hand, when the
recommended list contains items of different types, there is a greater chance that the
user might like at least one of these items. Diversity has the benefit of ensuring that
the user does not get bored by repeated recommendation of similar items.

Examples of Recommender Systems


In today’s digital age, we are bombarded with vast information and choices. From
online shopping to streaming services, it can be overwhelming to navigate through the
plethora of options available. This is where recommender systems come in – they help
us make sense of the endless choices by suggesting relevant options based on our
interests and preferences.

1. Netflix

Netflix’s recommendation engine is perhaps the most well-known and widely


used recommender system. It uses an algorithm to analyze a user’s viewing history,

5
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

rating, and search behavior to suggest movies and TV shows that the user is likely to
enjoy. The algorithm takes into account the genre, the actors, the director, and other
factors to make personalized recommendations for each user.

2. Amazon
Amazon’s recommendation engine suggests products based on a user’s purchase
history, search history, and browsing behavior. It makes personalized recommendations
based on the user’s prior purchases, products viewed, and items added to their shopping
cart.
3. Spotify
Spotify’s music recommendation system suggests songs, playlists, and albums
depending on a user’s listening history, liked songs, and search history. It tailors
recommendations based on the user’s listening habits, favorite genres, and favorite
artists.
4. YouTube
YouTube’s recommendation engine suggests videos based on a user’s viewing
history, liked videos, and search history. The algorithm considers factors such as the
user’s favourite channels, the length of time spent watching a video, and other viewing
habits to make personalized recommendations.
5. LinkedIn
LinkedIn’s recommendation engine suggests jobs, connections, and content based on
a user’s profile, skills, and career history. To make personalized recommendations, the
algorithm takes the user’s job title, industry, and location.
6. Zillow
Zillow’s recommendation system suggests real estate properties depend on a user’s
search history and preferences. Users can receive personalized recommendations based
on their budget, location, and desired features.
7. Airbnb
Airbnb’s recommendation system suggests accommodations based on a user’s search
history, preferences, and reviews. Personal recommendations are made based on factors
such as the user’s travel history, location, and desired amenities.
8. Uber
Uber’s recommendation system suggests ride options created on a user’s previous
rides and preferred options. When recommending rides, the algorithm considers factors
such as the user’s preferred vehicle type, location, and other preferences.

9. Google Maps
Google Maps’ recommendation system suggests places to visit, eat, and shop based on
a user’s search history and location. Personalized recommendations are generated based
on factors such as the user’s location, time of day, and preferences.
10. Goodreads
Goodreads’ recommendation engine suggests books centred on a user’s reading history,
ratings, and reviews. To provide personalized recommendations, the algorithm
considers factors such as the user’s reading habits, genres, and favorite authors.
From online shopping to entertainment and travel. These systems have significantly
improved the user experience by suggesting relevant options based on our interests and

6
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

preferences. The success of these real-world examples showcases the power and
effectiveness of recommender systems in various industries. With advancements in
artificial intelligence, recommender systems are expected to become even more accurate and
personalized in the future.

PERSONALIZED AND NON-PERSONALIZED RECOMMENDER


SYSTEMS:

personalized and non-personalized recommender systems are two broad categories of


recommendation engines that differ in their approach to generating recommendations. While
personalized systems tailor recommendations to individual users based on their preferences
and behavior, non-personalized systems provide the same recommendations to all users,
regardless of their individual characteristics. Let's delve into each category:

A. Traditional Recommender Systems:


• Overview: Traditional recommender systems typically use explicit input features or
rules to generate recommendations. These systems often rely on general attributes of
items or users, and their recommendations are not personalized to the specific
preferences or behavior of individual users.
• Example Features:
o Genre of a movie
o Author of a book
o Popularity or overall ratings of items
• Methodology:
o Recommendations are made based on fixed criteria or predetermined
rules.
o Users receive the same recommendations regardless of their unique
preferences.
o

7
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

• Advantages:
o Simplicity and ease of implementation.
o Less reliance on individual user data.
o Suitable for scenarios where personalization is not a critical factor.

B. Non-personalized Recommender Systems:


• Overview: Non-personalized recommender systems provide the same set of
recommendations to all users, without considering individual user preferences or
behaviors. These systems often focus on providing popular or trending items that are
likely to appeal to a broad audience.
• Examples:
o "Top 10" lists or rankings
o Bestsellers
o Most viewed items
• Methodology:
o Recommendations are based on aggregate data, such as overall popularity or
global trends.
o All users receive identical recommendations.
• Advantages:
o Easy to implement and computationally efficient.
o Applicable in scenarios where personalization is not feasible or necessary.
o Can be effective for new users or when limited user data is available.

• While personalized and non-personalized recommender systems have their advantages,


they may lack the ability to provide highly relevant and tailored recommendations that
reflect individual user preferences. As a result, these systems are often contrasted with
personalized recommender systems, which leverage user- specific data to deliver more
accurate and targeted suggestions.

PERSONALIZED RECOMMENDER SYSTEMS

Personalized recommendation systems are designed to provide tailored recommendations


to individual users based on their past behavior, preferences, and demographic
information.

Based on the user’s data such as purchases or ratings, personalized recommenders try to
understand and predict what items or content a specific user is likely to be interested in. In that
way, every user will get customized recommendations.

what makes a good recommendation?

• Is personalized (relevant to that user),


• Is diverse (includes different user interests),
• Doesn’t recommend the same items to users for the second time, and
• Recommends available products at the right time.

8
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

There are a few types of personalized recommendation systems, including content-based


filtering, collaborative filtering, and hybrid recommenders.

TYPES OF PERSONALIZED RECOMMENDER SYSTEMS

Personalized recommender systems can be categorized into several types, each with its own
methods and techniques for providing tailored recommendations.

These include:

• Content-based filtering,
• Collaborative filtering, and
• Hybrid recommenders.
CONTENT-BASED FILTERING

• Content-based recommender systems use items or user metadata to create specific


recommendations. To do this, we look at the user’s purchase history.
• For example, if a user has already read a book from one author or a product from a
certain brand, you assume that they have a preference for that author or that brand. Also,
there is a probability that they will buy a similar product in the future.

9
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

Let’s assume that Jenny loves sci-fi books and her favorite writer is Walter Jon Williams. If
she reads the Aristoi book, then her recommended book will be Angel Station, also a sci-fi
book written by Walter Jon Williams.

Pros of the content-based approach

The content-based approach is one of the common techniques used in personalized


recommendation systems. It has its advantages and disadvantages, which are important to
consider when deciding to implement this approach.

Advantages

• Less cold-start problem: Content-based recommendations can effectively address the


“cold-start” problem, allowing new users or items with limited interaction history to
still receive relevant recommendations.
• Transparency: Content-based filtering allows users to understand why a
recommendation is made because it’s based on the content and attributes of items
they’ve previously interacted with.
• Diversity: Considering various attributes, content-based systems can provide diverse
recommendations. For example, in a movie recommendation system, recommendations
can be based on genre, director, and actors.
• Reduced data privacy concerns: Since content-based systems primarily use item
attributes, they may not require as much user data, which can mitigate privacy concerns
associated with collecting and storing user data.
Disadvantages of the content-based approach

• The “Filter bubble”: Content filtering can recommend only content similar to the user’s
past preferences. If a user reads a book about a political ideology and books related to that
ideology are recommended to them, they will be in the “bubble of their previous interests”.
• Limited serendipity: Content-based systems may have limited capability to recommend
items that are outside a user’s known preferences.
• In the first case scenario, 20% of items attract the attention of 70-80% of users and 70-80%
of items attract the attention of 20% of users. The recommender’s goal is to introduce other
products that are not available to users at first glance.
• In the second case scenario, content-based filtering recommends products that are fitting
content-wise, yet very unpopular (i.e. people don’t buy those products for some reason, for
example, the book is bad even though it fits thematically).
• Over-specialization: If the content-based system relies too heavily on a user’s past
interactions, it can recommend items that are too similar to what the user has already seen
or interacted with, potentially missing opportunities for diversification.
COLLABORATIVE FILTERING

• Collaborative filtering is a popular technique used to provide personalized


recommendations to users based on the behavior and preferences of similar users.

10
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

• The fundamental idea behind collaborative filtering is that users who have interacted
with items in similar ways or have had similar preferences in the past are likely to have
similar preferences in the future, too.
• Collaborative filtering relies on the collective wisdom of the user community to
generate recommendations.
There are two main types of collaborative filtering: memory-based and model-based.

Memory-based recommenders

• Memory-based recommenders rely on the direct similarity between users or items to


make recommendations.
• Usually, these systems use raw, historical user interaction data, such as user-item ratings
or purchase histories, to identify similarities between users or items and generate
personalized recommendations.
• The biggest disadvantage of memory-based recommenders is that they require a lot of
data to be stored and comparing every item/user with every item/user is extremely
computationally demanding.

• Memory-based recommenders can be categorized into two main types user-based and
item-based collaborative filtering.
A user-based collaborative filtering recommender system

• With the used-based approach, recommendations to the target user are made by
identifying other users who have shown similar behavior or preferences. This translates
to finding users who are most similar to the target user based on their historical
interactions with items. This could be “users who are similar to you also liked…” type
of recommendations.
• But if we say that users are similar, what does that mean?

11
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

• Let’s say that Jenny and Tom both love sci-fi books. This means that, when a new sci-
fi book appears and Jenny buys that book, that same book will be recommended to Tom,
since he also likes sci-fi books.
An item-based collaborative filtering recommender system

• In item-based collaborative filtering, recommendations are made by identifying items


that are similar to the ones the target user has already interacted with.
• The idea is to find items that share similar user interactions and recommend those items
to the target user. This can include “users who liked this item also liked…” type of
recommendations.
• To illustrate with an example, let’s assume that John, Robert, and Jenny highly rated
sci-fi books Fahrenheit 451 and The Time Machine, giving them 5 stars. So, when Tom
buys Fahrenheit 451, the system automatically recommends The Time Machine to him
because it has identified it as similar based on other users’ ratings.
How to calculate user-user and item-item similarities?

• Unlike the content-based approach where metadata about users or items is used, in the
collaborative filtering memory-based approach we are looking at the user’s behavior
e.g. whether the user liked or rated an item or whether the item was liked or rated by a
certain user.
• For example, the idea is to recommend Robert the new sci-fi book. Let’s look at the
steps in this process:
• Create a user-item-rating matrix.
• Create a user-user similarity matrix: Cosine similarity is calculated (alternatives:
adjusted cosine similarity, Pearson similarity, Spearman rank correlation) between
every two users. This is how we get a user-user matrix. This matrix is smaller than the
initial user-item-rating matrix.

12
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

• Look up similar users: In the user-user matrix, we observe users that are most similar
to Robert.
• Candidate generation: When we find Robert’s most similar users, we look at all the
books these users read and the ratings they gave them.
• Candidate scoring: Depending on the other users’ ratings, books are ranked from the
ones they liked the most, to the ones they liked the least. The results are normalized on
a scale from 0 to 1.
• Candidate filtering: We check if Robert has already bought any of these books and
eliminate those he already read.
• The item-item similarity calculation is done in an identical way and has all the same
steps as user-user similarity.
Model-based recommenders

• Model-based recommenders make use of machine learning models to generate


recommendations.
• These systems learn patterns, correlations, and relationships from historical user-item
interaction data to make predictions about a user’s preferences for items they haven’t
interacted with yet.
• There are different types of model-based recommenders, such as matrix factorization,
Singular Value Decomposition (SVD), or neural networks.
• However, matrix factorization remains the most popular one, so let’s explore it a bit
further.

Matrix factorization

• Matrix factorization is a mathematical technique used to decompose a large matrix into the
product of multiple smaller matrices.
• In the context of recommender systems, matrix factorization is commonly employed to
uncover latent patterns or features in user-item interaction data, allowing for personalized
recommendations. Latent information can be reported by analyzing user behavior.
• If there is feedback from the user, for example – they have watched a particular movie or
read a particular book and have given a rating, that can be represented in the form of a
matrix. In this case,

o Rows represent users,


o Columns represent items, and

13
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

o The values in the matrix represent user-item interactions (e.g., ratings, purchase
history, clicks, or binary preferences).
Since it’s almost impossible for the user to rate every item, this matrix will have many
unfilled values. This is called sparsity.

The matrix factorization process

Matrix factorization aims to approximate this interaction matrix by factorizing it into two or
more lower-dimensional matrices:

• User latent factor matrix (U), which contains information about users and their
relationships with latent factors.
• Item latent factor matrix (V), which contains information about items and their
relationships with latent factors.
The rating matrix is a product of two smaller matrices – the item-feature matrix and the user-
feature matrix. The higher the score in the matrix, the better the match between the item and
the user.

The matrix factorization process includes the following steps:

• Initialization of random user and item matrix,


• The ratings matrix is obtained by multiplying the user and the transposed item matrix,
• The goal of matrix factorization is to minimize the loss function (the difference in the
ratings of the predicted and actual matrices must be minimal). Each rating can be
described as a dot product of a row in the user matrix and a column in the item matrix.
Minimization of loss function

14
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

•Where K is a set of (u, i) pairs, r(u, i) is the rating for item i by user u and λ is a
regularization term (used to avoid overfitting).
• In order to minimize loss function we can apply Stochastic Gradient Descent (SGD) or
Alternating Least Squares (ALS). Both methods can be used to incrementally update
the model as new rating comes in. SGD is faster and more accurate than ALS.
Advantages of collaborative filtering

• Effective personalization: Collaborative filtering is highly effective in providing


personalized recommendations to users. It takes into account the behavior and preferences
of similar users to suggest items that a particular user is likely to enjoy.
• No need for item attributes: Collaborative filtering works solely based on user-item
interactions, making it applicable to a wide range of recommendation scenarios where item
features may be sparse or unavailable. This is especially useful in content-rich platforms.
• Serendipitous (unanticipated) discoveries: Collaborative filtering can introduce users to
items they might not have discovered otherwise. By analyzing user behaviors and
identifying patterns across the user community, collaborative filtering can recommend
items that align with a user’s tastes but may not be immediately obvious to them.
Disadvantages of collaborative filtering

It’s important to note that while collaborative filtering offers these and other advantages, it also
has its limitations, including:

The “cold-start” problem:

• User cold start occurs when a new user joins the system without any prior interaction
history. Collaborative filtering relies on historical interactions to make
recommendations, so it can’t provide personalized suggestions to new users who start
with no data.
• Item cold start happens when a new item is added, and there’s no user interaction data
for it. Collaborative filtering has difficulty recommending new items since it lacks
information about how users have engaged with these items in the past.
• Sensitivity to sparse data: Collaborative filtering depends on having enough user-item
interaction data to provide meaningful recommendations. In situations where data is
sparse and users interact with only a small number of items, collaborative filtering may
struggle to find useful patterns or similarities between users and items.
• Potential for popularity bias: Collaborative filtering tends to recommend popular
items more frequently. This can lead to a “rich get richer” phenomenon, where already
popular items receive even more attention, while niche or less-known items are
overlooked.
• To address these and other limitations, recommendation systems often use hybrid
approaches that combine collaborative filtering with content-based methods or other
techniques to improve recommendation quality in the long run.

15
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

HYBRID RECOMMENDERS

• Hybrid recommendation systems combine multiple recommendation techniques or


approaches to provide more accurate, diverse, and effective personalized
recommendations.
• They are particularly valuable in real-world recommendation scenarios because they
can provide more robust, accurate, and adaptable recommendations.
• The choice of which hybrid approach to use depends on the specific requirements and
constraints of the recommendation system and the nature of the available data.
Pros of hybrid recommenders
Some of the most common advantages of hybrid recommenders include:
• Improved recommendation quality: Hybrid recommenders leverage multiple
recommendation techniques, combining their strengths to provide more accurate and
diverse recommendations. This often results in higher recommendation quality
compared to individual methods, benefiting users by offering more relevant
suggestions.
• Enhanced robustness and flexibility: Hybrid models are often more robust in
handling various recommendation scenarios. They can adapt to different data
characteristics, user behaviors, and recommendation challenges. This flexibility is
valuable in real-world recommendation systems.
• Addressing common recommendation limitations: Hybrid recommenders can
mitigate the limitations of individual recommendation techniques. For example, they
can overcome the “cold-start” problem for new users and items by incorporating
content-based recommendations, providing serendipitous suggestions, and reducing
popularity bias.
Cons of hybrid recommenders
• Just like all other recommenders systems, hybrid recommenders have their downsides,
too. Some include:

• Increased complexity and development effort: Implementing and maintaining hybrid


recommendation systems can be more complex and resource-intensive. It requires
expertise in multiple recommendation techniques and careful integration of these
methods.
• Data and computational demands: Hybrid models often require more data and
computational resources because they use multiple recommendation algorithms. This
can be challenging, especially in large-scale systems with vast user-item interactions
and a diverse catalog of items.
• Tuning and parameter sensitivity: Hybrid recommenders may involve a greater
number of parameters and hyperparameters that need to be fine-tuned. Yet, ensuring
optimal parameter settings for each recommendation component can be challenging and
time-consuming.
• While hybrid recommenders offer significant advantages in terms of recommendation
quality and versatility, you should carefully consider the trade-offs and resource
requirements when deciding which system to implement.

16
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

• This is the best way to ensure that the benefits of hybridization outweigh the added
complexity and costs.

EVALUATION METRICS FOR RECOMMENDER SYSTEMS


• To assess the performance and effectiveness of recommender systems, you have to take
into consideration certain evaluation metrics.
• They can help you measure how well a recommendation algorithm or model is
performing and provide insights into its strengths and weaknesses.
• There are several categories of evaluation metrics, depending on the specific aspect of
recommendations being assessed.

Some common evaluation metrics include:


• Accuracy metrics assess the accuracy of the recommendations made by a system in
terms of how well they match the user’s actual preferences or behavior. Here we have
Mean Absolute Error (MAE), Root Mean Square Error (RMSE), or Mean Squared
Logarithmic Error (MSLE).
• Ranking metrics evaluate how well a recommender system ranks items for a user,
especially in top-N recommendation scenarios. Think of hit rate, average reciprocal hit
rate (ARHR), cumulative hit rate, or rating hit rate.
• Diversity metrics assess the diversity of recommended items to ensure that
recommendations are not overly focused on a narrow set of items. These include Intra-
List Diversity or Inter-List Diversity.
• Novelty metrics evaluate how well a recommender system introduces users to new or
unfamiliar items. Catalog coverage and item popularity belong to this category.
• Serendipity metrics assess the system’s ability to recommend unexpected but
interesting items to users – surprise or diversity are looked at in this case.
You can also choose to look at some business metrics such as conversion rate, click-through
rate (CTR), or revenue impact. But, ultimately, the best way to do an online evaluation
of your recommender system is through A/B testing.

OVERVIEW OF DATA MINING METHODS:


Recommender Systems (RS) typically apply techniques and methodologies from other
neighboring areas– such as Human Computer Interaction (HCI) or Information Retrieval
(IR). However, most of these systems bear in their core an algorithm that can be understood
as a particular instance of a Data Mining (DM) technique
Data mining methods play a crucial role in building effective recommender systems by
extracting patterns and insights from large datasets. One key aspect of recommender systems
involves measuring similarity between users, items, or both. Let's explore an overview of
data mining methods for recommender systems and common similarity measures:
Data Mining Methods for Recommender Systems:
1. Association Rule Mining:
• Overview: Association rule mining identifies relationships or patterns in user-item
interactions. It helps discover associations between items that are frequently co-
purchased or co-viewed.
• Application: Generating recommendations based on association rules, e.g., "Users
who bought X also bought Y."

17
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

2. Clustering Algorithms:
• Overview: Clustering methods group users or items with similar characteristics.
Users or items within the same cluster are likely to share common preferences.
• Application: Recommending items popular within a user's cluster, assuming similar
preferences within the group.

3. Classification Algorithms:
• Overview: Classification models predict user preferences for items based on historical
interactions. These models can be trained to classify items as relevant or irrelevant to
a user.
• Application: Providing recommendations by predicting user preferences for items not
yet interacted with.

4. Matrix Factorization:
• Overview: Matrix factorization techniques decompose the user-item interaction matrix
into latent factors, capturing hidden patterns and relationships. Singular Value
Decomposition (SVD) and Alternating Least Squares (ALS) are common matrix
factorization methods.
• Application: Predicting missing values in the user-item matrix to recommend items a
user might like.

5. Deep Learning Models:


• Overview: Deep learning models, such as neural networks, can capture complex
patterns in user-item interactions. Neural collaborative filtering is an example where
embeddings are used to represent users and items.
• Application: Learning intricate user-item relationships for more accurate and
personalized recommendations.

Similarity Measures:
Different data types require different functions to measure the similarity of data points.
Diffentiation between unary, binary and quantitative data helps with most problems. Unary
data could be the number of likes for a blog post. Binary data could be likes and dislikes of a
video and quantitative data could be rating provided like 4/10 stars or similar. The following
table summarises which similarity functions are suitable for different data types.

1. Cosine Similarity:
• Definition: Measures the cosine of the angle between two vectors, representing users
or items, in a multidimensional space.
• Cosine similarity is a measure used to determine the similarity between two non-zero
vectors in a vector space. It calculates the cosine of the angle between the vectors,
representing their orientation and similarity.

18
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

• A · B denotes the dot product of vectors A and B, which is the sum of the element-
wise multiplication of their corresponding components.
• ||A|| represents the Euclidean norm or magnitude of vector A, calculated as the square
root of the sum of the squares of its components.
• ||B|| represents the Euclidean norm or magnitude of vector B.
The resulting value ranges from -1 to 1, where 1 indicates that the vectors are in the same
direction (i.e., completely similar), -1 indicates they are in opposite directions (i.e.,
completely dissimilar), and 0 indicates they are orthogonal or independent (i.e., no
similarity). It is particularly useful in scenarios where the magnitude of the vectors is not
significant, and the focus is on the direction or relative orientation of the vectors.
Dimensionality Independence: It is not affected by the magnitude or length of vectors. It
solely focuses on the direction or orientation of the vectors. This property makes it valuable
when dealing with high-dimensional data or sparse vectors, where the magnitude of the
vectors may not be as informative as their relative angles or orientations.
Sparse Data: It is particularly effective when working with sparse data, where vectors have
many zero or missing values. In such cases, the non-zero elements play a crucial role in
capturing the meaningful information and similarity between vectors.
• Application: In recommender systems, cosine similarity can be used to measure the
similarity between user preferences or item characteristics, aiding in generating
personalised recommendations based on similar user preferences or item profiles.
2. Pearson Correlation Coefficient:
• Definition: Measures linear correlation between two variables, providing a measure
of the strength and direction of a linear relationship.
• The Pearson correlation coefficient, also known as Pearson’s correlation or simply
correlation coefficient, is a statistical measure that quantifies the linear relationship
between two variables. It measures how closely the data points of the variables align
on a straight line, indicating the strength and direction of the relationship.

The Pearson correlation coefficient is denoted by the symbol “r” and takes values
between -1 and 1. The coefficient value indicates the following:
• r = 1: Perfect positive correlation. The variables have a strong positive linear
relationship, meaning that as one variable increases, the other variable also
increases proportionally.
• r = -1: Perfect negative correlation. The variables have a strong negative linear
relationship, meaning that as one variable increases, the other variable decreases
proportionally.
• r = 0: No linear correlation. There is no linear relationship between the variables.
They are independent of each other.
• Application: Evaluating how well users' preferences align, especially in scenarios
with numerical ratings.
3. Jaccard Similarity:

19
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

• Definition: Measures the intersection over the union of sets, quantifying the similarity
between two sets.
• It calculates the size of the intersection of the sets divided by the size of their union.
The resulting value ranges from 0 to 1, where 0 indicates no similarity and 1 indicates
complete similarity.

• In other words, to calculate the Jaccard similarity, you need to determine the common
elements between the sets of interest and divide it by the total number of distinct
elements across both sets.
• In other words, to calculate the Jaccard similarity, you need to determine the common
elements between the sets of interest and divide it by the total number of distinct
elements across both sets.
• It is useful because it provides a straightforward and intuitive measure to quantify the
similarity between sets. Its simplicity makes it applicable in various domains and
scenarios.
• Here are some key reasons for its usefulness:
• Set Comparison: It enables the comparison of sets without considering the
specific elements or their ordering. It focuses on the presence or absence of
elements, making it suitable for cases where the structure or attributes of the
elements are not important or would need additional feature engineering, which
would slow down the system.
• Scale-Invariant: It remains unaffected by the size of the sets being compared.
It solely relies on the intersection and union of sets, making it a robust measure
even when dealing with sets of different sizes.
• Binary Data: It is particularly suitable for binary data, where elements are
either present or absent in the sets. It can be applied to scenarios where the
presence or absence of specific features or attributes is important for
comparison.
• Applications
• In the context of a recommender system, Jaccard similarity can be used to
identify users with similar item preferences and recommend items that are
highly rated or popular among those similar users. By leveraging Jaccard
similarity, the recommender can enhance the personalisation of
recommendations and help users discover relevant items based on the
preferences of users with similar tastes.
• Assessing similarity between sets of items liked or interacted with by users.

4. Euclidean Distance:
• Definition: Represents the straight-line distance between two points in a
multidimensional space.
• Application: Quantifying the dissimilarity or proximity between user or item
vectors.

5. Manhattan Distance:

20
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

• Definition: Measures the distance between two points by summing the absolute
differences along each dimension.
• Application: Similar to Euclidean distance, but may be less sensitive to outliers.

6. Hamming Distance:
• Definition: Measures the number of positions at which corresponding bits differ in
two binary strings.
• Application: Suitable for comparing binary user profiles or item representations.

Choosing the appropriate data mining method and similarity measure depends on the
characteristics of the data, the nature of the recommendation problem, and
computational considerations. Hybrid approaches that combine multiple methods or
measures often yield more robust and accurate recommendations.

DIMENSIONALITY REDUCTION:
Overview:
Dimensionality reduction is a technique used to reduce the number of features
(dimensions) in a dataset while preserving its essential information. In the context of
recommender systems, dimensionality reduction is often applied to user-item interaction
matrices to capture latent factors that represent hidden patterns in the data. By reducing
the dimensionality, the computational complexity decreases, and the model becomes
more efficient.
Methods:
• Principal Component Analysis (PCA): PCA is a popular linear dimensionality
reduction method that transforms the original features into a new set of uncorrelated
variables (principal components) while preserving the variance in the data.
• Singular Value Decomposition (SVD): SVD is a matrix factorization technique that
decomposes a matrix into three other matrices, capturing latent factors. It is commonly
used in collaborative filtering for recommender systems.
• Non-Negative Matrix Factorization (NMF): NMF decomposes a matrix into
two lower-rank matrices with non-negative elements, making it suitable for
scenarios where non-negativity is a meaningful constraint.

Applications in Recommender Systems:


• Reducing Sparsity: Recommender system datasets are often sparse, with many
missing values in the user-item interaction matrix. Dimensionality reduction helps in
filling in missing values by approximating the original matrix with lower-rank
approximations.
• Capturing Latent Factors: By reducing the dimensionality, latent factors
representing user preferences and item characteristics can be identified, leading to
more efficient and effective recommendations.

SINGULAR VALUE DECOMPOSITION:


When it comes to dimensionality reduction, the Singular Value Decomposition (SVD) is
a popular method in linear algebra for matrix factorization in machine learning. Such a

21
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

method shrinks the space dimension from N-dimension to K-dimension (where K<N) and
reduces the number of features. SVD constructs a matrix with the row of users and
columns of items and the elements are given by the users’ ratings. Singular value
decomposition decomposes a matrix into three other matrices and extracts the factors from
the factorization of a high-level (user-item-rating) matrix.

• Matrix U: singular matrix of (user*latent factors)


• Matrix S: diagonal matrix (shows the strength of each latent factor)
• Matrix U: singular matrix of (item*latent factors)

From matrix factorization, the latent factors show the characteristics of the items. Finally,
the utility matrix A is produced with shape m*n. The final output of the matrix A reduces
the dimension through latent factors’ extraction. From the matrix A, it shows the
relationships between users and items by mapping the user and item into r-dimensional
latent space. Vector X_i is considered each item and vector Y_u is regarded as each user.
The rating is given by a user on an item as 𝑹_𝒖𝒊 = 𝑿^𝑻_𝒊 ∗ 𝒀_𝒖. The loss can be
minimized by the square error difference between the product of R_ui and the expected
rating.

Regularization is used to avoid overfitting and generalize the dataset by adding the
penalty.

Here, we add a bias term to reduce the error of actual versus predicted value by the
model.
(u, i): user-item pair
μ: the average rating of all items
bi: average rating of item i minus μ
bu: the average rating given by user u minus μ
The equation below adds the bias term and the regularization term:

Introduction to truncated SVD


When it comes to matrix factorization technique, truncated Singular Value
Decomposition (SVD) is a popular method to produce features that factors a matrix M
into the three matrices U, Σ, and V. Another popular method is Principal Component
Analysis (PCA). Truncated SVD shares similarity with PCA while SVD is produced from
the data matrix and the factorization of PCA is generated from the covariance matrix.
Unlike regular SVDs, truncated SVD produces a factorization where the number of
columns can be specified for a number of truncation. For example, given an n x n matrix,
truncated SVD generates the matrices with the specified number of columns, whereas
SVD outputs n columns of matrices.
The advantages of truncated SVD over PCA
Truncated SVD can deal with sparse matrix to generate features’ matrices, whereas PCA
would operate on the entire matrix for the output of the covariance matrix.
1. Hands-on experience of python code
2. Data Description:

22
CCS360 – RECOMMENDER SYSTEMS
UNIT-1: INTRODUCTION

The metadata includes 45,000 movies listed in the Full MovieLens Dataset and movies
are released before July 2017. Cast, crew, plot keywords, budget, revenue, posters, release
dates, languages, production companies, countries, TMDB vote counts and vote averages
are in the dataset. The scale of ratings is 1–5 and obtained from the official GroupLens
website. The dataset is referred to from the Kaggle dataset.
3. Recommending movies using SVD
Singular value decomposition (SVD) is a collaborative filtering method for movie
recommendation. The aim for the code implementation is to provide users with movies’
recommendation from the latent features of item-user matrices. The code would show you
how to use the SVD latent factor model for matrix factorization.
Applications in Recommender Systems:
• Matrix Factorization: SVD is used to factorize the user-item interaction matrix
into lower-rank approximations, capturing latent factors that represent user
preferences and item characteristics.
• Collaborative Filtering: SVD is a key technique in collaborative filtering-based
recommender systems, where it helps in identifying latent relationships between
users and items.
• Handling Sparsity: SVD can handle sparse matrices effectively, providing a
way to impute missing values in the original matrix and improving the quality
of recommendations.
• Regularization Techniques: Regularized versions of SVD, such as Regularized
SVD, incorporate regularization terms to prevent overfitting and enhance the
generalization ability of the model.

23
UNIT-I-INTRODUCTION
EXTERNAL LEARNING – SINGULAR VALUE DECOMPOSITION (SVD)
APPLICATIONS

According to the formula for SVD,

SVD Formula
1. A is the input matrix
2. U are the left singular vectors,
3. sigma are the diagonal/eigenvalues
4. V are the right singular vectors.
The shape of these three matrices will be
1. A — m x n matrix
2. U — m x k matrix
3. Sigma — k x k matrix
4. V — n x k matrix
Step 1
So, as the first step, we need to find eigenvalues (watch the video provided below to get an
understanding of eigenvalues and eigenvectors) of matrix A and as A can be a rectangular
matrix, we need to convert it to a square matrix by multiplying A with its transpose. Here, for
easier computation I have taken A as a 2 x 2 matrix.

Step 2
Now, that we have a square matrix, we can calculate the eigenvalues of A(transpose) A. We,
can do so by calculating the determinant of A(transpose)A — (lambda)I where lambda are
the two eigenvalues.

1
UNIT-I-INTRODUCTION
EXTERNAL LEARNING – SINGULAR VALUE DECOMPOSITION (SVD)
APPLICATIONS

Solving the equation, we get

Step 3
Once we have calculated the eigenvalues, it’s time to calculate the two eigenvectors for each
eigenvalue. So, let’s start by calculating the eigenvector for 10.
Step 3.1
We plug the value of lambda in the A(transpose)A — (lambda)I matrix.

2
UNIT-I-INTRODUCTION
EXTERNAL LEARNING – SINGULAR VALUE DECOMPOSITION (SVD)
APPLICATIONS

In order to find the eigenvector, we need to find the null space of a matrix where AB = 0. In
other words,

Next, we need to reduce this matrix to the Row-Echelon Form so that we can easily solve the
equation. Let’s talk about Row-Echelon for a moment here.
Row-Echelon Form
A matrix is said to be in row-echelon form if the following rules are satisfied.
1. All the leading entries in each row of the matrix is 1
2. If a column contains a leading entry then all the entries below the leading entry should
be zero
3. If any two consecutive non-zero rows, the leading entry in the upper row should occur
to the left of the leading entry in the lower row.
4. All rows which consist only of zeros should occur in the bottom of the matrix
We need to perform some operations on the rows to reduce the matrix. These operations are
called elementary row operations and there are a certain rules to follow for these operations
as given below,

Armed with the above rules, lets start reducing the matrix in Step 3.1 to row-echelon form.

3
UNIT-I-INTRODUCTION
EXTERNAL LEARNING – SINGULAR VALUE DECOMPOSITION (SVD)
APPLICATIONS

Now, we can solve for null space as below to find the eigenvector for eigenvalue 10

Once we get this vector, we need to convert it to a unit vector . The way we do that is by
taking the columnar values and dividing them by taking the square root of the sum of squares
of the values. So, in this case we do the following,

4
UNIT-I-INTRODUCTION
EXTERNAL LEARNING – SINGULAR VALUE DECOMPOSITION (SVD)
APPLICATIONS

So the final eigenvector for eigenvalue is

We do the similar steps to get the eigenvector for eigenvalue 40

5
UNIT-I-INTRODUCTION
EXTERNAL LEARNING – SINGULAR VALUE DECOMPOSITION (SVD)
APPLICATIONS

EigenVector for 40
Now that we have got both the eigenvectors, let’s put it together.

Note that the diagonal values in sigma are always in the descending order and so the vectors
are also placed in that corresponding order. If you are familiar with PCA, the principal
components correspond to the top k diagonal element which captures the most variance. The
higher the value, the more important the component is and the more variance they describe.
Step 4
Now that we have our V and Sigma matrices, now it’s time to find U. We can just multiply
the equation by sigma(inverse) and V on both sides to get the equation for U. In this case, as
V is an orthogonal matrix, the transpose and inverse of V are the same, therefore,
V(transpose) multiplied by V becomes an identity matrix.
Note: On the left hand side, it is sigma(inverse) and not transpose as mentioned in the slide
below

6
UNIT-I-INTRODUCTION
EXTERNAL LEARNING – SINGULAR VALUE DECOMPOSITION (SVD)
APPLICATIONS

Next up, we need to convert this to unit vectors using the steps described above.

Next up we multiply this matrix with Sigma (transpose) which is sigma in itself because its a
diagonal matrix.

7
UNIT-I-INTRODUCTION
EXTERNAL LEARNING – SINGULAR VALUE DECOMPOSITION (SVD)
APPLICATIONS

√8 2√2 2√2 2√2


2√8 √2 4√2 √2
= =

Now, we need to convert this to unit vectors to get the final U matrix.

2 2 2 2 1 2
√8 5 √5 = 2√5 √5 = √5 √5
4 1 4 1 2 1
√4 5 √5 2√5 √5 √5 √5
=

So, there you go, we have calculated U, sigma and V and decomposed the matrix A into three
matrices as given below.

8
UNIT-I-INTRODUCTION
EXTERNAL LEARNING – SINGULAR VALUE DECOMPOSITION (SVD)
APPLICATIONS

1 √40 0 0 √10 1 1
√5 √ √2 √2
2 √10 1 1
√40 0 0
√2 √2
=
√5 √5

√40 2√10 1 1
√5 √5 √2 √2
√40 √10 1 1
√2 √2
=
√5 √5

1 1 1 1
√8 2√2 √2 √2 = 2√2 2√2 √2 √2
2√8 √2 1 1 4√2 √2 1 1
√2 √2 √2 √2
=

2√2 1 2√2 1 2√2 1 2√2 1


√2 √2 √2 √2
4√2 1 √2 1 4√2 1 √2 1
√2 √2 √2 √2

2+2 2+2 4 0
4 1 4 1 3 5
= = =A

Thus

= Σ! "

9
Unit-I-Question Bank

1. What are Recommender systems?


A Recommendation System is a subclass of information filtering system that seeks to
predict the rating or preference a user would give to an item.
Recommender systems usually make use of either or both collaborative filtering and
content-based filtering, as well as other systems such as knowledge-based systems

2. List the types of Recommender system.

3. What are the primary goals of Recommender system?


Prediction version of problem: The first approach is to predict the rating value for
a user-item combination. It is assumed that training data is available, indicating user

1
preferences for items. For m users and n items, this corresponds to an incomplete
� × � matrix, where the specified (or observed) values are used for training. The
missing (or unobserved) values are predicted using this training model. This
problem is also referred to as the matrix completion problem because we have an
incompletely specified matrix of values, and the remaining values are predicted by
the learning algorithm.
Ranking version of problem: In practice, it is not necessary to predict the ratings
of users for specific items in order to make recommendations to users. Rather, a
merchant may wish to recommend the top-k items for a particular user, or determine
the top-k users to target for a particular item. The determination of the top-k items is
more common than the determination of top-k users, although the methods in the
two cases are exactly analogous.

4. What are the steps tailored by Recommender system?


1 — Understand the Business
2 — Get the Data.
3 — Explore, Clean, and Augment the Data.
4 — Predict the Ranking.
5 — Visualize the Data.
6 — Iterate and Deploy Models.
5. what is the basic principle that underlies the working of recommendation
algorithms?
The basic principle of recommendations is that there are significant dependencies
between user- and item-centric activity.

6. what is content based filtering?


Content-Based Filtering is a type of recommender system in which the
recommendations are based on the similarity between the content of the items being
recommended and the content of items the user has liked or consumed in the past.
This approach builds a model that represents the user’s preferences based on item
features.

2
A classic example of content-based filtering is the “related items” feature in online
marketplaces. For example, if a user liked a smartphone, a content-based
recommender system would recommend other smartphones with similar features,
such as a large screen size, high resolution, and fast processor.

7. What is CollaborativeFilteringRecommenderSystems?
Collaborative filtering is a type of recommender system that predicts what items a
user might like based on the preferences of similar users. It works by analyzing user
behavior and finding patterns that can be used to make recommendations.
One common approach to collaborative filtering is user-based filtering, where the
system identifies users with similar preferences and recommends items that those
users have liked in the past. Another approach is item-based filtering, where the
system recommends items that are similar to those that the user has already liked.

8. What is HybridRecommenderSystems?
A Hybrid Recommender System combines two or more recommendation techniques
in order to achieve better accuracy and coverage in the recommendations. The two
main types of systems used in hybrid models are Collaborative Filtering and Content-
Based Filtering.
Collaborative Filtering uses data on user behavior, such as ratings or clicks, to
recommend items based on the preferences of similar users. Content-Based Filtering

3
uses data on the features of the items, such as genre or topic, to recommend items based
on the interests of the user.
One example of a hybrid recommender system is the Netflix recommendation system.
Netflix uses collaborative filtering to suggest movies based on similar users'
preferences, but also incorporates content-based filtering by suggesting titles based on
the genre, actor, or director that the user has previously viewed.

9. What is meant by Matrix Factorization Recommender Systems?


Matrix factorization is a class of collaborative filtering algorithms used in recommender
systems. Matrix factorization algorithms work by decomposing the user-item
interaction matrix into the product of two lower dimensionality rectangular matrices.
Matrix factorization models decompose the user-item interaction matrix in to latent
factors, allowing the system to make predictions based on these factors.
Example: Singular Value Decomposition (SVD)and Alternating Least Squares
(ALS) are common matrix factorization techniques used in recommender systems.
10. What is meant by Context-Aware Recommender Systems?
Context-aware recommender systems (CARS) generate more relevant
recommendations by adapting them to the specific contextual situation of the user.
The context-based recommender system retrieves patterns from World Wide Web-
based on the user's past interactions and provides future news recommendations.
Context-aware systems take into account additional contextual information, such as
time, location, or user activity, to enhance the relevance of recommendations.
11. What is meant by Knowledge-Based Recommender Systems?
Knowledge-based recommender systems (knowledge-based recommenders) are a specific
type of recommender system that are based on explicit knowledge about the item
assortment, user preferences, and recommendation criteria (i.e., which item should be
recommended in which context).

4
12. What is meant by Deep Learning-Based Recommender Systems?
Deep learning (DL) is a powerful technique for product recommendations, inspired by
the brain's structure and function. It can process data in a non-linear way, extracting
hidden insights and generating more accurate recommendations.
In the training phase, the model is trained to predict user-item interaction probabilities
(calculate a preference score) by presenting it with examples of interactions (or non-
interactions) between users and items from the past.

13. List the goals of Recommender system.


• Personalization
• Increased User Satisfaction
• Improved Engagement
• Diversity
• Accuracy
• Scalability
• Adaptability
• Explainability
• Serendipity:
• Privacy

5
• Cross-Domain Recommendations
• Novelty:
• Cold Start Problem
• Long-Term Recommendations
• Business Objectives
14. What is meant by personalized recommender system?
Personalized recommendation systems are designed to provide tailored recommendations to
individual users based on their past behavior, preferences, and demographic information.

15. What is non personalized Recommender system?


Non-personalized recommender systems provide the same set of recommendations to all
users, without considering individual user preferences or behaviors. These systems often
focus on providing popular or trending items that are likely to appeal to a broad audience
Examples:
o "Top10"listsorrankings
o Bestsellers
o Mostvieweditems
16. List the types of recommender system.

6
17. What is traditional recommender system?
Traditional recommender systems typically use explicit input features or rules to generate
recommendations. These systems often rely on general attributes of items or users, and their
recommendations are not personalized to the specific preferences or behaviour of individual
users.

Example Features:
o Genre of a movie
o Author of a book
o Popularity or overall ratings of items

18. List the common evaluation metrics used in Recommender system.


Accuracy metrics assess the accuracy of the recommendations made by a system in
terms of how well they match the user’s actual preferences or behavior. Here we have
Mean Absolute Error (MAE), Root Mean Square Error (RMSE), or Mean Squared
Logarithmic Error (MSLE).
Ranking metrics evaluate how well a recommender system ranks items for a user,
especially in top-N recommendation scenarios. Think of hit rate, average reciprocal
hit rate (ARHR), cumulative hit rate, or rating hit rate.
Diversity metrics assess the diversity of recommended items to ensure that
recommendations are not overly focused on a narrow set of items. These include
Intra-List Diversity or Inter-List Diversity.
Novelty metrics evaluate how well a recommender system introduces users to new or
unfamiliar items. Catalog coverage and item popularity belong to this category.
Serendipity metrics assess the system’s ability to recommend unexpected but
interesting items to users – surprise or diversity are looked at in this case.
Business metrics such as conversion rate, click-through rate (CTR), or revenue
impact.
An online evaluation of your recommender system is through A/B testing.
19. How datamining plays a vital role in building Recommender system?
One of the best-known examples of data mining in recommender systems is the discovery of
association rules, or item-to-item correlations (Sarwar et. al, 2001). These techniques identify
items frequently found in “association” with items in which a user has expressed interest.
20. Which one of the mining algorithms used for building recommendation systems?
There are various data mining algorithms such as k-means, Page Rank, EM data mining, and
Apriori which are used to build the recommender systems.
21. What is meant by similarity measures in Recommender system?
Similarity in a recommender system is about finding items (or users, or user and item) that are
similar. How to measure it depends on which type of recommender you use. If you are doing
collaborative filtering, then two items are similar if a a certain number of people like or hate it
the same way
22. Which are the three main components of a recommender system?
There are three main components in any recommender system: data set, algorithm,
and recommendations

7
23. What is meant by Cosine Similarity?
Definition: Measures the cosine of the angle between two vectors, representing users
or items, in a multidimensional space.
Cosine similarity is a measure used to determine the similarity between two non-zero
vectors in a vector space. It calculates the cosine of the angle between the vectors,
representing their orientation and similarity.

A · B denotes the dot product of vectors A and B, which is the sum of the element-
wise multiplication of their corresponding components.
||A|| represents the Euclidean norm or magnitude of vector A, calculated as the square
root of the sum of the squares of its components.
||B|| represents the Euclidean norm or magnitude of vector B.
The resulting value ranges from -1 to 1, where 1 indicates that the vectors are in the
same direction (i.e., completely similar), -1 indicates they are in opposite directions
(i.e., completely dissimilar), and 0 indicates they are orthogonal or independent (i.e.,
no similarity). It is particularly useful in scenarios where the magnitude of the vectors
is not significant, and the focus is on the direction or relative orientation of the vectors
24. What is meant Pearson Correlation Coefficient?
• The Pearson correlation coefficient, also known as Pearson’s correlation or
simply correlation coefficient, is a statistical measure that quantifies the linear
relationship between two variables. It measures how closely the data points of the
variables align on a straight line, indicating the strength and direction of the
relationship.

The Pearson correlation coefficient is denoted by the symbol “r” and takes values
between -1 and 1. The coefficient value indicates the following:
• r = 1: Perfect positive correlation. The variables have a strong positive linear
relationship, meaning that as one variable increases, the other variable also increases
proportionally.

8
• r = -1: Perfect negative correlation. The variables have a strong negative linear
relationship, meaning that as one variable increases, the other variable decreases
proportionally.
• r = 0: No linear correlation. There is no linear relationship between the
variables. They are independent of each other.
25. What is Jaccard Similarity?
Measures the intersection over the union of sets, quantifying the similarity between
two sets.
• It calculates the size of the intersection of the sets divided by the size of their
union. The resulting value ranges from 0 to 1, where 0 indicates no similarity and 1
indicates complete similarity.

• In other words, to calculate the Jaccard similarity, you need to determine the
common elements between the sets of interest and divide it by the total number of
distinct elements across both sets.
26. What is the applications of dimensionality reduction in Recommender Systems?
Reducing Sparsity: Recommender system datasets are often sparse, with many
missing values in the user-item interaction matrix. Dimensionality reduction helps in
filling in missing values by approximating the original matrix with lower-rank
approximations.
Capturing Latent Factors: By reducing the dimensionality, latent factors representing
user preferences and item characteristics can be identified, leading to more efficient
and effective recommendations
27. What is meant by Singular Value Decomposition?
When it comes to dimensionality reduction, the Singular Value Decomposition (SVD)
is a popular method in linear algebra for matrix factorization in machine learning. he
Singular Value Decomposition of a matrix is a factorization of the matrix into three
matrices. Thus, the singular value decomposition of matrix A can be expressed in
terms of the factorization of A into the product of three matrices as

SVD Formula
1. A is the input matrix
2. U are the left singular vectors,
3. sigma are the diagonal/eigenvalues
4. V are the right singular vectors.
The shape of these three matrices will be
1. A — m x n matrix
2. U — m x k matrix
3. Sigma — k x k matrix
4. V — n x k matrix
28. List the difference Between Content-Based Filtering and Collaborative Filtering

9
Aspect Content-Based Collaborative Filtering
Filtering
Focus Item attributes User Behaviour
Recommendation Items similar to Items liked by similar
what the user users to the user
likes
Data required Information User behavior data, such
about the item as ratings or purchases
Advantage Doesn’t require Can recommend niche or
user data new items
Disadvantage May miss out on Needs sufficient user
new interests data to be effective

Part -B
1. Enumerate the goals of Recommender system.
Recommender systems aim to enhance user experience and satisfaction by providing
personalized suggestions or recommendations. The goals of recommender systems
include:
Personalization: Tailor recommendations to individual user preferences, behaviors, and
needs.
Increased User Satisfaction: Enhance user experience by offering relevant and
interesting suggestions.
Improved Engagement: Encourage users to explore and interact with the system by
presenting appealing recommendations.
Diversity: Provide a variety of recommendations to avoid monotony and introduce users
to a broader range of items.
Accuracy: Deliver accurate predictions and recommendations based on user data and
preferences.
Serendipity (Chance): Introduce users to unexpected but relevant items that they may
not have discovered on their own. Introduce users to new and potentially interesting items
that they may not have considered.
Scalability: Efficiently handle a growing number of users and items while maintaining
recommendation quality.
Adaptability: Adjust recommendations over time to reflect changes in user preferences
and behaviors.
Explainability: Offer clear explanations for why certain recommendations are made,
helping users understand and trust the system.
Privacy: Safeguard user privacy by minimizing the collection and exposure of sensitive
information.
Cross-Domain Recommendations: Extend recommendations across different domains
or types of items to cater to diverse user interests.
Novelty: Introduce users to novel or less-known items that align with their preferences,
fostering exploration.
Cold Start Problem: Address challenges related to new users or items lacking sufficient
historical data for accurate recommendations.
Long-Term Recommendations: Provide suggestions that align with users' long-term
preferences and evolving tastes.

10
Business Objectives: Align recommendations with the business goals, such as increasing
sales, user engagement, or other key performance indicators.
Recommender systems leverage various algorithms and techniques, including
collaborative filtering, content-based filtering, hybrid methods, and deep learning, to achieve
these goals. The specific approach depends on the nature of the data and the requirements of
the application.
2. Explain SVD applications with an example?
3. Suppose, we have a four-dimensional dataset (Features 1 through 4) find out the
following similarity measures between Row1and Row 3
a. Cosine
b. Jaccard index
c. Weighted Jaccard Index
d. Tanimoto coefficient/index/similarity
Feature 1 Feature 2 Feature 3 Feature 4
Row 1 10 3 3 5
Row 2 5 4 5 3
Row 3 9 4 6 4
Row 4 8 6 2 6
Row 5 20 15 10 20
Step 1: Extract Row Vectors
From the table, we extract the feature values for Row 1 and Row 3:
Row 1=(10,3,3,5)
Row 3=(9,4,6,4)

(a) Cosine Similarity


Cosine similarity is computed as:

(b) Jaccard Index


The Jaccard index is used for binary data and is defined as:

(c) Weighted Jaccard Index


For numerical data, the Weighted Jaccard Index is computed as:
11
(d) Tanimoto Coefficient
The Tanimoto coefficient (also called the extended Jaccard similarity for real-valued vectors) is given by:

Step 2: Compute the Values


Let's compute these similarity measures using Python.
Computed Similarity Measures Between Row 1 and Row 3:
1. Cosine Similarity = 0.9591 (High similarity)
2. Jaccard Index = 1.0 (All features are nonzero in both rows)
3. Weighted Jaccard Index = 0.76 (Moderate similarity based on feature magnitudes)
4. Tanimoto Coefficient = 0.9211 (Similar to cosine similarity but considers squared magnitudes)

4. List some of the popular examples of Recommender Systems


5. Discuss on content based personalized recommender system with examples.
6. Discuss on collaborative filtering and its types in detail.

7. Enumerate difference between collaborative recommendation engine and


content-based recommendation engine

12
8. Discuss the steps to build a recommender system.
Define the problem: Identify the type of Recommender System that best suits the
problem. For example, if the need is to recommend a set of products to a user,
then a Collaborative Filtering-based Recommender System is the ideal choice.
Gather and preprocess the Data: Collect sufficient and adequate data on Users
and Items, with relevant metadata. Carry out Data cleaning, Preprocessing and
Feature Engineering (if necessary).
Split the Data: Divide the Preprocessed data into Training, Validation and Test
sets. The sizes can vary depending on the size of the dataset but typically 70-20-
10% is good.
Select appropriate Metrics: Decide on the relevant Evaluation Metrics to
measure the performance of the Recommender System.
Develop the Model: Develop and fine-tune a suitable Modelling approach based
on the Split data, the Type of Recommender System, and the Evaluation Metrics.
For example, a Matrix Factorization approach using Gradient Descent could be used
for Collaborative Filtering-based Recommender Systems.
Train the Model: Train the Model on the Training data and Validate the Model
on the Validation data, adjusting the hyperparameters if necessary.
Assess Model Performance: Test the performance of the Model on the Test data
using the Evaluation metrics previously defined. Output the final results, such as
Precision or Recall, to determine which Model performs the best in production.
Deploy the Model: After choosing the best Model, deploy it for use in
Production.
Maintain the Model: Maintain the Model by periodically retraining, as necessary,
based on newly connected data.
Iterate: Iterate on the entire process to improve the Model's accuracy and
efficiency continually.

13
9. Discuss on the basic types of Recommender system.
Recommender systems can be broadly categorized into several types based on their
underlying algorithms and techniques. The basic taxonomy of recommender systems
includes:
Collaborative Filtering:
User-Based Collaborative Filtering: Recommends items based on the preferences
and behaviors of users with similar tastes.
Item-Based Collaborative Filtering: Suggests items that are similar to those liked by
the user.
Content-Based Filtering:
Analyzes the features of items and recommends items with similar characteristics to
those the user has shown interest in.
Hybrid Recommender Systems:
Combines multiple recommendation approaches, such as collaborative filtering and
content-based filtering, to improve overall accuracy and overcome individual
limitations.
Knowledge-Based Recommender Systems:
Utilizes explicit knowledge about users and items to make recommendations. This
knowledge is often provided by experts or encoded in a knowledge base.
Context-Aware Recommender Systems:
Takes into account contextual information such as location, time, or the user's current
activity to provide more relevant and personalized recommendations.
Matrix Factorization:
Decomposes the user-item interaction matrix into latent factors to capture hidden
patterns and relationships between users and items.
Deep Learning-Based Recommender Systems:
Utilizes neural networks and deep learning architectures to automatically learn
intricate patterns and representations from user-item interactions.
Association Rule Mining:
Discovers relationships or associations between different items based on historical
user behavior, commonly used in basket analysis.

14
Demographic-Based Recommender Systems:
Considers demographic information about users, such as age, gender, or occupation,
to tailor recommendations.
Community-Based Recommender Systems:
Leverages the wisdom of the crowd by considering recommendations from a
community of users with similar preferences.
Implicit Feedback Recommender Systems:
Handles implicit feedback, such as clicks or views, to infer user preferences and make
recommendations.
Ephemeral Recommender Systems:
Focuses on recommending items with a short lifespan, such as news articles, based on
current trends and user preferences.
10. Depict the basic taxonomy of recommender systems with examples Basic Taxonomy of
Recommender Systems with Examples ( April-May-2024)
Recommender systems are broadly classified based on how they generate recommendations. The main categories
include Collaborative Filtering, Content-Based Filtering, Hybrid Systems, Knowledge-Based Systems, and
Popularity-Based Systems.

1. Collaborative Filtering (CF)


Collaborative Filtering makes recommendations based on past user-item interactions (e.g., ratings, clicks,
purchases). It assumes that users with similar tastes will like similar items.
Types of Collaborative Filtering:
A. User-Based Collaborative Filtering
• Finds users with similar preferences and recommends items liked by those users.
• Example:
o Netflix recommends movies based on what other users with similar viewing history liked.
o Amazon suggests products frequently bought by users with similar shopping behavior.
B. Item-Based Collaborative Filtering
• Recommends items that are similar to what the user has interacted with before.
• Example:
o Amazon’s “Customers who bought this also bought” feature.
o Spotify suggesting songs that are frequently played together in user playlists.
C. Matrix Factorization-Based Collaborative Filtering
• Uses mathematical techniques like Singular Value Decomposition (SVD) to discover hidden patterns in
user-item interaction data.
• Example:
o Netflix Prize-winning recommendation algorithm uses SVD to predict user preferences.
Limitation: Struggles with the cold start problem (new users and items lack sufficient data).

2. Content-Based Filtering (CBF)


This method recommends items based on the features of the items (e.g., genre, description, keywords) and user
preferences.
How It Works:
• Builds a user profile based on past interactions.
• Suggests items that have similar attributes to the ones the user has liked before.
Examples:
• Spotify: Recommends songs based on genres, artists, or tempo of previously listened songs.
• Netflix: Suggests movies similar to those watched before (e.g., if you watch sci-fi movies, more sci-fi
movies will be recommended).
• News websites: Google News recommends articles similar to the ones you frequently read.
Limitation:
• Cold start problem for new items: If an item has no metadata, it cannot be recommended.
• Limited diversity: The system keeps recommending similar items, leading to a filter bubble (users only
see content reinforcing their preferences).

3. Hybrid Recommender Systems

15
Hybrid systems combine multiple recommendation approaches (e.g., Collaborative + Content-Based) to
overcome individual limitations and improve accuracy.
Types of Hybrid Approaches:
1. Weighted Hybrid: Assigns different weights to multiple recommendation methods and combines them.
o Example: YouTube suggests videos using both user history (Collaborative) and video metadata
(Content-Based).
2. Switching Hybrid: Switches between methods based on available data.
o Example: Amazon may use Content-Based Filtering if a user has little purchase history but
switch to Collaborative Filtering when more data is available.
3. Feature Augmentation: Uses one method to enhance another.
o Example: Netflix uses Collaborative Filtering but enriches it with metadata like movie genres.
Advantage: Provides better personalization and overcomes cold start issues.
Disadvantage: More complex and computationally expensive.

4. Knowledge-Based Recommender Systems


These systems do not rely on past user behavior. Instead, they use explicit user preferences and domain
knowledge to suggest items.
How It Works:
• Uses rules, constraints, or expert knowledge to make recommendations.
• Suitable for complex, infrequent purchases where user preferences are specific.
Examples:
• Real Estate Websites: Suggests properties based on budget, location, and property type.
• Travel Planning: Recommends destinations based on climate, budget, and activities.
• Car Buying Platforms: Suggests cars based on fuel efficiency, price, and brand preferences.
Advantage: Works well for items that users do not frequently purchase.
Disadvantage: Requires manual knowledge engineering to define rules.

5. Popularity-Based Recommender Systems


• These systems recommend the most popular items based on overall user interactions.
• No personalization—simply suggests trending items.
Examples:
• Netflix Trending Section: Displays the most-watched movies.
• Amazon Bestsellers: Recommends the most purchased products.
• YouTube Trending: Shows viral videos based on watch count and engagement.
Advantage: Simple and effective for new users with no interaction history.
Disadvantage: Lacks personalization, which can reduce engagement.

Comparison of Recommender System Types


Type Strengths Weaknesses
Collaborative Highly personalized, does not require item Struggles with new users/items (cold start
Filtering features problem)
Content-Based Struggles with new items, filter bubble
Works well with new users, easy to explain
Filtering effect
More accurate, overcomes CF & CBF
Hybrid Systems Computationally complex
limitations
Knowledge-Based Good for rare, high-value items Requires expert-defined rules
Popularity-Based Simple and effective for new users Lacks personalization

11. Elaborate the Unique difference features of traditional, non-personalized recommender system by
giving example for each category
Unique Differences Between Traditional and Non-Personalized Recommender Systems (With Examples)
Recommender systems can be broadly classified into Traditional (Personalized) Recommender Systems and
Non-Personalized Recommender Systems. Their primary difference lies in how they generate
recommendations—whether they adapt to individual user preferences or provide general suggestions to all users.

1. Traditional (Personalized) Recommender Systems


16
These systems tailor recommendations to each user based on their past behavior, preferences, or demographic
information.
Unique Features of Traditional Recommender Systems:
Personalized Recommendations:
• Generates different suggestions for different users based on their behavior or preferences.
User Interaction Dependent:
• Requires explicit (ratings, reviews) or implicit (clicks, views) feedback to make recommendations.
Adaptable to User Changes:
• Can dynamically update recommendations as user preferences evolve.
Data-Driven Predictions:
• Uses machine learning, AI, and statistical methods to predict what a user might like.
Categories & Examples:
Type Description Example
Collaborative Suggests items based on what similar Netflix recommending movies based on similar
Filtering users liked. users' watch history.
Content-Based Suggests items similar to what the Spotify suggesting songs similar to those you
Filtering user has liked before. frequently listen to.
Combines multiple approaches for YouTube recommending videos based on both watch
Hybrid Filtering
better accuracy. history and video metadata.
Key Advantage: Highly personalized recommendations improve user engagement.
Key Limitation: Struggles with new users or items due to the cold start problem.

2. Non-Personalized Recommender Systems


These systems provide the same recommendations to all users, without considering individual preferences.
Unique Features of Non-Personalized Recommender Systems:
Same Recommendations for Everyone:
• No personalization—recommendations are based on popularity, trends, or expert knowledge.
No User Data Required:
• Does not depend on user interaction history, making it useful for new users.
Simple & Fast Computation:
• Easier to implement since it does not require complex machine learning models.
Good for General Trends:
• Useful for recommending widely popular or general-interest content.
Categories & Examples:
Type Description Example
Popularity-Based Suggests the most popular items based YouTube Trending Videos – Displays the
Recommendations on overall user engagement. most-watched videos globally.
Best-Seller Recommends top-selling products, Amazon’s "Best Sellers" – Lists products
Recommendations movies, or books. with the highest sales.
Expert-Based Recommendations curated by experts or Movie reviews recommending top films
Recommendations editors. for a given genre.
Context-Based Suggests items based on location, time, Google News showing top stories based
Recommendations or event. on your country.
Key Advantage: Works well for new users and requires no interaction history.
Key Limitation: Lacks personalization, which may reduce engagement over time.

Comparison: Traditional vs. Non-Personalized Recommender Systems


Feature Traditional (Personalized) Non-Personalized
Personalization Yes, tailored to user preferences. No, same recommendations for all.
User Data Required Yes, requires user interactions. No, does not need user data.
Static, does not adapt to individual
Adaptability Adjusts based on user behavior.
users.
Computational
High (AI, ML-based models). Low (simple rule-based methods).
Complexity

17
Feature Traditional (Personalized) Non-Personalized
Netflix, Spotify, Amazon personalized YouTube Trending, Amazon
Example
recommendations. Bestsellers.

18

You might also like