0% found this document useful (0 votes)
48 views37 pages

Recommendation Engine Problem Statement

The document describes building a recommendation engine using collaborative filtering on a movie ratings dataset. It discusses preprocessing the data, creating user and item similarity matrices, and visualizing the most viewed movies. Code examples are provided to retrieve the data, encode genres, create sparse matrices, calculate similarities, and plot views. The goal is to build a recommendation model using item-based collaborative filtering to suggest top selling DVDs to customers.

Uploaded by

SBS Movies
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views37 pages

Recommendation Engine Problem Statement

The document describes building a recommendation engine using collaborative filtering on a movie ratings dataset. It discusses preprocessing the data, creating user and item similarity matrices, and visualizing the most viewed movies. Code examples are provided to retrieve the data, encode genres, create sparse matrices, calculate similarities, and plot views. The goal is to build a recommendation model using item-based collaborative filtering to suggest top selling DVDs to customers.

Uploaded by

SBS Movies
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Topic: Recommendation Engine

Instructions:
Please share your answers filled in-line in the word document. Submit code separately wherever applicable.

Please ensure you update all the details:


Name: Darshan BR Batch ID: DSWEMOH100721
Topic: Recommender Engine

Problem Statement: -

Q) Build a recommender system with the given data using UBCF.

This dataset is related to the video gaming industry and a survey was
conducted to build a
recommendation engine so that the store can improve the sales of its
gaming DVDs. Snapshot of the dataset is given below. Build a
Recommendation Engine and suggest top selling DVDs to the store
customers.

Importing Essential Libraries

In our Data Science project, we will make use of these four packages – ‘recommenderlab’, ‘ggplot2’, ‘data.table’ and ‘reshape2’.
Code:

library(recommenderlab)

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Output Screenshot:

Code:

library(ggplot2) #Author DataFlair

library(data.table)

library(reshape2)

Output Screenshot:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Wait! Don’t forget to check our leading guide on R programming classification
Retrieving the Data

We will now retrieve our data from movies.csv into movie_data dataframe and ratings.csv into rating_data. We will use the str()
function to display information about the movie_data dataframe.

Code:

setwd("/home/dataflair/data/movie_data") #Author DataFlair

movie_data <- read.csv("movies.csv",stringsAsFactors=FALSE)

rating_data <- read.csv("ratings.csv")

str(movie_data)

Output Screenshot:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


We can overview the summary of the movies using the summary() function. We will also use the head() function to print the first
six lines of movie_data

Code:

summary(movie_data) #Author DataFlair

Output Screenshot:

Code:

head(movie_data)

Output Screenshot:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Similarly, we can output the summary as well as the first six lines of the ‘rating_data’ dataframe –

Code:

summary(rating_data) #Author DataFlair

Output Screenshot:

Code:

head(rating_data)

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Output Screenshot:

Revise your R concepts with DataFlair for Free, checkout 120+ FREE R Tutorials
Data Pre-processing

From the above table, we observe that the userId column, as well as the movieId column, consist of integers. Furthermore, we
need to convert the genres present in the movie_data dataframe into a more usable format by the users. In order to do so, we will
first create a one-hot encoding to create a matrix that comprises of corresponding genres for each of the films.

Code:

movie_genre <- as.data.frame(movie_data$genres, stringsAsFactors=FALSE)

library(data.table)

movie_genre2 <- as.data.frame(tstrsplit(movie_genre[,1], '[|]',

type.convert=TRUE),

stringsAsFactors=FALSE) #DataFlair

colnames(movie_genre2) <- c(1:10)

list_genre <- c("Action", "Adventure", "Animation", "Children",

"Comedy", "Crime","Documentary", "Drama", "Fantasy",

"Film-Noir", "Horror", "Musical", "Mystery","Romance",

© 2013 - 2021 360DigiTMG. All Rights Reserved.


"Sci-Fi", "Thriller", "War", "Western")

genre_mat1 <- matrix(0,10330,18)

genre_mat1[1,] <- list_genre

colnames(genre_mat1) <- list_genre

for (index in 1:nrow(movie_genre2)) {

for (col in 1:ncol(movie_genre2)) {

gen_col = which(genre_mat1[1,] == movie_genre2[index,col]) #Author DataFlair

genre_mat1[index+1,gen_col] <- 1

genre_mat2 <- as.data.frame(genre_mat1[-1,], stringsAsFactors=FALSE) #remove first row, which was the genre list

for (col in 1:ncol(genre_mat2)) {

genre_mat2[,col] <- as.integer(genre_mat2[,col]) #convert from characters to integers

str(genre_mat2)

Screenshot:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Output –

In the next step of Data Pre-processing of R project, we will create a ‘search matrix’ that will allow us to perform an easy search of
the films by specifying the genre present in our list.

Code:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


SearchMatrix <- cbind(movie_data[,1:2], genre_mat2[])

head(SearchMatrix) #DataFlair

Output Screenshot:

There are movies that have several genres, for example, Toy Story, which is an animated film also falls under the genres of Comedy,
Fantasy, and Children. This applies to the majority of the films.

For our movie recommendation system to make sense of our ratings through recommenderlabs, we have to convert our matrix
into a sparse matrix one. This new matrix is of the class ‘realRatingMatrix’. This is performed as follows:

Code:

ratingMatrix <- dcast(rating_data, userId~movieId, value.var = "rating", na.rm=FALSE)

ratingMatrix <- as.matrix(ratingMatrix[,-1]) #remove userIds

#Convert rating matrix into a recommenderlab sparse matrix

ratingMatrix <- as(ratingMatrix, "realRatingMatrix")

ratingMatrix

Output Screenshot:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Are you facing any trouble in implementing recommendation system project in R? Comment below, DataFlair Team is ready to help
you.
Let us now overview some of the important parameters that provide us various options for building recommendation systems for
movies-

Code:

recommendation_model <- recommenderRegistry$get_entries(dataType = "realRatingMatrix")

names(recommendation_model)

Output Screenshot:

Code:

lapply(recommendation_model, "[[", "description")

Output Screenshot:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Want to become the next data scientist? Try out the best way and explore Data Science Tutorials Series to learn Data Science in
an easy way with DataFlair!!
We will implement a single model in our R project – Item Based Collaborative Filtering.

Code:

recommendation_model$IBCF_realRatingMatrix$parameters

Output Screenshot:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Exploring Similar Data

Collaborative Filtering involves suggesting movies to the users that are based on collecting preferences from many other users. For
example, if a user A likes to watch action films and so does user B, then the movies that the user B will watch in the future will be
recommended to A and vice-versa. Therefore, recommending movies is dependent on creating a relationship of similarity between
the two users. With the help of recommenderlab, we can compute similarities using various operators like cosine, pearson as well
as jaccard.

Code:

similarity_mat <- similarity(ratingMatrix[1:4, ],

method = "cosine",

which = "users")

as.matrix(similarity_mat)

image(as.matrix(similarity_mat), main = "User's Similarities")

Output Screenshot:

In the above matrix, each row and column represents a user. We have taken four users and each cell in this matrix represents the
similarity that is shared between the two users.

Now, we delineate the similarity that is shared between the films –

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Code:

movie_similarity <- similarity(ratingMatrix[, 1:4], method =

"cosine", which = "items")

as.matrix(movie_similarity)

image(as.matrix(movie_similarity), main = "Movies similarity")

Output Screenshot:

Let us now extract the most unique ratings –

rating_values <- as.vector(ratingMatrix@data)

unique(rating_values) # extracting unique ratings

Now, we will create a table of ratings that will display the most unique ratings.

Code:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Table_of_Ratings <- table(rating_values) # creating a count of movie ratings

Table_of_Ratings

Output Screenshot:

This is the right time to check your R and Data Science Learning. Try these latest interview questions and become a pro.
Most Viewed Movies Visualization

In this section of the machine learning project, we will explore the most viewed movies in our dataset. We will first count the
number of views in a film and then organize them in a table that would group them in descending order.

Code:

library(ggplot2)

movie_views <- colCounts(ratingMatrix) # count views for each movie

table_views <- data.frame(movie = names(movie_views),

views = movie_views) # create dataframe of views

table_views <- table_views[order(table_views$views,

decreasing = TRUE), ] # sort by number of views

© 2013 - 2021 360DigiTMG. All Rights Reserved.


table_views$title <- NA

for (index in 1:10325){

table_views[index,3] <- as.character(subset(movie_data,

movie_data$movieId == table_views[index,1])$title)

table_views[1:6,]

Input Screenshot:

Output –

Now, we will visualize a bar plot for the total number of views of the top films. We will carry this out using ggplot2.

Code:

ggplot(table_views[1:6, ], aes(x = title, y = views)) +

© 2013 - 2021 360DigiTMG. All Rights Reserved.


geom_bar(stat="identity", fill = 'steelblue') +

geom_text(aes(label=views), vjust=-0.3, size=3.5) +

theme(axis.text.x = element_text(angle = 45, hjust = 1)) +

ggtitle("Total Views of the Top Films")

Input Screenshot:

Output:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


From the above bar-plot, we observe that Pulp Fiction is the most-watched film followed by Forrest Gump.

If you are enjoying this Data Science Recommendation System Project, DataFlair brings another project for you – Credit Card
Fraud Detection using R. Save the link, you can thank me later�
Heatmap of Movie Ratings

Now, in this data science project of Recommendation system, we will visualize a heatmap of the movie ratings. This heatmap will
contain first 25 rows and 25 columns as follows –

Code:

image(ratingMatrix[1:20, 1:25], axes = FALSE, main = "Heatmap of the first 25 rows and 25 columns")

Input Screenshot:

Output:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Performing Data Preparation

We will conduct data preparation in the following three steps –

 Selecting useful data.


 Normalizing data.
 Binarizing the data.
For finding useful data in our dataset, we have set the threshold for the minimum number of users who have rated a film as 50.
This is also same for minimum number of views that are per film. This way, we have filtered a list of watched films from least-
watched ones.

Code:

movie_ratings <- ratingMatrix[rowCounts(ratingMatrix) > 50,

colCounts(ratingMatrix) > 50]

Movie_ratings

Output Screenshot:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


From the above output of ‘movie_ratings’, we observe that there are 420 users and 447 films as opposed to the previous 668 users
and 10325 films. We can now delineate our matrix of relevant users as follows –

Code:

minimum_movies<- quantile(rowCounts(movie_ratings), 0.98)

minimum_users <- quantile(colCounts(movie_ratings), 0.98)

image(movie_ratings[rowCounts(movie_ratings) > minimum_movies,

colCounts(movie_ratings) > minimum_users],

main = "Heatmap of the top users and movies")

Input Screenshot:

Output:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Data Visualization in R – Learn the concepts in an easy way
Now, we will visualize the distribution of the average ratings per user.

average_ratings <- rowMeans(movie_ratings)

qplot(average_ratings, fill=I("steelblue"), col=I("red")) +

ggtitle("Distribution of the average rating per user")

Output Screenshot:

Output:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Data Normalization

In the case of some users, there can be high ratings or low ratings provided to all of the watched films. This will act as a bias while
implementing our model. In order to remove this, we normalize our data. Normalization is a data preparation procedure to
standardize the numerical values in a column to a common scale value. This is done in such a way that there is no distortion in the
range of values. Normalization transforms the average value of our ratings column to 0. We then plot a heatmap that delineates
our normalized ratings.

Code:

normalized_ratings <- normalize(movie_ratings)

sum(rowMeans(normalized_ratings) > 0.00001)

image(normalized_ratings[rowCounts(normalized_ratings) > minimum_movies,

colCounts(normalized_ratings) > minimum_users],

main = "Normalized Ratings of the Top Users")

Output Screenshot:

Output:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Performing Data Binarization

In the final step of our data preparation in this data science project, we will binarize our data. Binarizing the data means that we
have two discrete values 1 and 0, which will allow our recommendation systems to work more efficiently. We will define a matrix
that will consist of 1 if the rating is above 3 and otherwise it will be 0.

Code:

binary_minimum_movies <- quantile(rowCounts(movie_ratings), 0.95)

binary_minimum_users <- quantile(colCounts(movie_ratings), 0.95)

#movies_watched <- binarize(movie_ratings, minRating = 1)

good_rated_films <- binarize(movie_ratings, minRating = 3)

image(good_rated_films[rowCounts(movie_ratings) > binary_minimum_movies,

colCounts(movie_ratings) > binary_minimum_users],

main = "Heatmap of the top users and movies")

Input Screenshot:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Output:

Didn’t you check what’s trending on DataFlair? Check out the latest Machine Learning Tutorial Series. Master all the ML
concepts for FREE NOW!!
Collaborative Filtering System

In this section of data science project, we will develop our very own Item Based Collaborative Filtering System. This type of
collaborative filtering finds similarity in the items based on the people’s ratings of them. The algorithm first builds a similar-items
table of the customers who have purchased them into a combination of similar items. This is then fed into the recommendation
system.

The similarity between single products and related products can be determined with the following algorithm –

 For each Item i1 present in the product catalog, purchased by customer C.


 And, for each item i2 also purchased by the customer C.
 Create record that the customer purchased items i1 and i2.
 Calculate the similarity between i1 and i2.
We will build this filtering system by splitting the dataset into 80% training set and 20% test set.

Code:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


sampled_data<- sample(x = c(TRUE, FALSE),

size = nrow(movie_ratings),

replace = TRUE,

prob = c(0.8, 0.2))

training_data <- movie_ratings[sampled_data, ]

testing_data <- movie_ratings[!sampled_data, ]

Input Screenshot:

Building the Recommendation System using R

We will now explore the various parameters of our Item Based Collaborative Filter. These parameters are default in nature. In the
first step, k denotes the number of items for computing their similarities. Here, k is equal to 30. Therefore, the algorithm will now
identify the k most similar items and store their number. We use the cosine method which is the default one but you can also use
pearson method.

Code:

recommendation_system <- recommenderRegistry$get_entries(dataType ="realRatingMatrix")

recommendation_system$IBCF_realRatingMatrix$parameters

Output Screenshot:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Code:

recommen_model <- Recommender(data = training_data,

method = "IBCF",

parameter = list(k = 30))

recommen_model

class(recommen_model)

Output Screenshot:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Let us now explore our data science recommendation system model as follows –

Using the getModel() function, we will retrieve the recommen_model. We will then find the class and dimensions of our similarity
matrix that is contained within model_info. Finally, we will generate a heatmap, that will contain the top 20 items and visualize the
similarity shared between them.

Code:

model_info <- getModel(recommen_model)

class(model_info$sim)

dim(model_info$sim)

top_items <- 20

image(model_info$sim[1:top_items, 1:top_items],

main = "Heatmap of the first rows and columns")

Output Screenshot:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Output:

In the next step of ML project, we will carry out the sum of rows and columns with the similarity of the objects above 0. We will
visualize the sum of columns through a distribution as follows –

Code:

sum_rows <- rowSums(model_info$sim > 0)

table(sum_rows)

© 2013 - 2021 360DigiTMG. All Rights Reserved.


sum_cols <- colSums(model_info$sim > 0)

qplot(sum_cols, fill=I("steelblue"), col=I("red"))+ ggtitle("Distribution of the column count")

Output Screenshot:

Output:

How to build Recommender System on dataset using R?

We will create a top_recommendations variable which will be initialized to 10, specifying the number of films to each user. We will
then use the predict() function that will identify similar items and will rank them appropriately. Here, each rating is used as a
weight. Each weight is multiplied with related similarities. Finally, everything is added in the end.

Code:

top_recommendations <- 10 # the number of items to recommend to each user

© 2013 - 2021 360DigiTMG. All Rights Reserved.


predicted_recommendations <- predict(object = recommen_model,

newdata = testing_data,

n = top_recommendations)

predicted_recommendations

Output Screenshot:

Code:

user1 <- predicted_recommendations@items[[1]] # recommendation for the first user

movies_user1 <- predicted_recommendations@itemLabels[user1]

movies_user2 <- movies_user1

for (index in 1:10){

movies_user2[index] <- as.character(subset(movie_data,

movie_data$movieId == movies_user1[index])$title)

movies_user2

Output Screenshot:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Output:

Code:

recommendation_matrix <- sapply(predicted_recommendations@items,

function(x){ as.integer(colnames(movie_ratings)[x]) }) # matrix with the recommendations for each user

#dim(recc_matrix)

recommendation_matrix[,1:4]

Output Screenshot:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Output:

Output:

Summary

Recommendation Systems are the most popular type of machine learning applications that are used in all sectors. They are an
improvement over the traditional classification algorithms as they can take many classes of input and provide similarity ranking
based algorithms to provide the user with accurate results. These recommendation systems have evolved over time and have
incorporated many advanced machine learning techniques to provide the users with the content that they want.

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Problem Statement: -

The Entertainment Company, which is an online movie watching platform, wants to improve its collection of movies and showcase
those that are highly rated and recommend those movies to its customer by their movie watching footprint. For this, the company
has collected the data and shared it with you to provide some analytical insights and also to come up with a recommendation
algorithm so that it can automate its process for effective recommendations. The ratings are between -9 and +9.

Step 1: Perform Exploratory Data Analysis (EDA) on the data

The dataset contains two CSV files, credits, and movies. The credits file contains all the metadata information about the movie and
the movie file contains the information like name and id of the movie, budget, languages in the movie that has been released, etc.

Let’s load the movie dataset using pandas.

import pandas as pd
path = "./Desktop/TechVidvan/movie_recommendation"
credits_df = pd.read_csv(path + "/tmdb_credits.csv")
movies_df = pd.read_csv(path + "/tmdb_movies.csv")
Let’s have a peek at our dataframes.

movies_df.head()
Output:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


credits_df.head()
Output:

We only need the id, title, cast, and crew columns of the credits dataframe. Let’s merge the dataframes into one on the column
‘id’.

credits_df.columns = ['id','title','cast','crew']
movies_df = movies_df.merge(credits_df, on="id")
Our new dataframe would be:

movies_df.head()
Output:

Step 2: Build the Movie Recommender System

The accuracy of predictions made by the recommendation system can be personalized using the “plot/description” of the movie.

But the quality of suggestions can be further improved using the metadata of movie. Let’s say the query to our movie
recommendation engine is “The Dark Knight Rises”. Then the predictions should also include movies directed by the director of the
film. It should also include movies with the cast of the given query movie.

For that, we utilize the following features to personalize the recommendation: cast, crew, keywords, genres.

The movie data is present in the form of lists containing strings, we need to convert the data into a safe and usable structure. Let’s
apply the literal_eval() function to the features.

from ast import literal_eval


features = ["cast", "crew", "keywords", "genres"]
for feature in features:
movies_df[feature] = movies_df[feature].apply(literal_eval)
movies_df[features].head(10)
© 2013 - 2021 360DigiTMG. All Rights Reserved.
Output:

Let’s write some functions to extract information like director from the above features.

The get_director() function extracts the name of the director of the movie.

def get_director(x):
for i in x:
if i["job"] == "Director":
return i["name"]
return np.nan
The get_list() returns the top 3 elements or the entire list whichever is more.

def get_list(x):
if isinstance(x, list):
names = [i["name"] for i in x]
if len(names) > 3:
names = names[:3]
return names
return []
Let’s apply both the functions get_director() and get_list() to our dataset.

movies_df["director"] = movies_df["crew"].apply(get_director)
features = ["cast", "keywords", "genres"]
for feature in features:
movies_df[feature] = movies_df[feature].apply(get_list)
In the above code, we passed the “crew” information to the get_director() function, extracted the name, and created a new
column “director”.

For the features cast, keyword and genres we extracted the top information by applying the get_list() function

Let’s see how the data looks like after the above transformations.

movies_df[['title', 'cast', 'director', 'keywords', 'genres']].head()


Output:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


The next step would be to convert the above feature instances into lowercase and remove all the spaces between them.

def clean_data(row):
if isinstance(row, list):
return [str.lower(i.replace(" ", "")) for i in row]
else:
if isinstance(row, str):
return str.lower(row.replace(" ", ""))
else:
return ""
features = ['cast', 'keywords', 'director', 'genres']
for feature in features:
movies_df[feature] = movies_df[feature].apply(clean_data)
Now, let’s create a “soup” containing all of the metadata information extracted to input into the vectorizer.

def create_soup(features):
return ' '.join(features['keywords']) + ' ' + ' '.join(features['cast']) + ' ' + features['director'] + ' ' + ' '.join(features['genres'])
movies_df["soup"] = movies_df.apply(create_soup, axis=1)
print(movies_df["soup"].head())
Output:

Our movie recommendation engine works by suggesting movies to the user based on the metadata information. The similarity
between the movies is calculated and then used to make recommendations. For that, our text data should be preprocessed and
converted into a vectorizer using the CountVectorizer. As the name suggests, CountVectorizer counts the frequency of each word
and outputs a 2D vector containing frequencies.

We don’t take into account the words like a, an, the (these are called “stopwords”) because these words are usually present in
higher amounts in the text and don’t make any sense.

There exist several similarity score functions such as cosine similarity, Pearson correlation coefficient, etc. Here, we use the cosine
similarity score as this is just the dot product of the vector output by the CountVectorizer.

We also reset the indices of our dataframe.

from sklearn.feature_extraction.text import CountVectorizer


from sklearn.metrics.pairwise import cosine_similarity
count_vectorizer = CountVectorizer(stop_words="english")
count_matrix = count_vectorizer.fit_transform(movies_df["soup"])
print(count_matrix.shape)
cosine_sim2 = cosine_similarity(count_matrix, count_matrix)
print(cosine_sim2.shape)
© 2013 - 2021 360DigiTMG. All Rights Reserved.
movies_df = movies_df.reset_index()
indices = pd.Series(movies_df.index, index=movies_df['title'])
Output:

Create a reverse mapping of movie titles to indices. By this, we can easily find the title of the movie based on the index.

indices = pd.Series(movies_df.index, index=movies_df["title"]).drop_duplicates()


print(indices.head())
Output:

Step 3: Get recommendations for the movies

The get_recommendations() function takes the title of the movie and the similarity function as input. It follows the below steps to
make recommendations.

 Get the index of the movie using the title.


 Get the list of similarity scores of the movies concerning all the movies.
 Enumerate them (create tuples) with the first element being the index and the second element is the cosine
similarity score.
 Sort the list of tuples in descending order based on the similarity score.
 Get the list of the indices of the top 10 movies from the above sorted list. Exclude the first element because it is
the title itself.
 Map those indices to their respective titles and return the movies list.
Create a function that takes in the movie title and the cosine similarity score as input and outputs the top 10 movies similar to it.

def get_recommendations(title, cosine_sim=cosine_sim):


idx = indices[title]
similarity_scores = list(enumerate(cosine_sim[idx]))
similarity_scores= sorted(similarity_scores, key=lambda x: x[1], reverse=True)
similarity_scores= sim_scores[1:11]
# (a, b) where a is id of movie, b is similarity_scores
movies_indices = [ind[0] for ind in similarity_scores]
movies = movies_df["title"].iloc[movies_indices]
return movies
print("################ Content Based System #############")
print("Recommendations for The Dark Knight Rises")
print(get_recommendations("The Dark Knight Rises", cosine_sim2))
print()
print("Recommendations for Avengers")
print(get_recommendations("The Avengers", cosine_sim2))
Output:

© 2013 - 2021 360DigiTMG. All Rights Reserved.


Here goes our movie recommendation engine.

Summary

In this machine learning project, we build movie recommendation systems. We built a content-based recommendation engine that
makes recommendations given the title of the movie as input.

© 2013 - 2021 360DigiTMG. All Rights Reserved.

You might also like