0% found this document useful (0 votes)

41 views11 pages

Module 4

Uploaded by

gguru5749

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views11 pages

Module 4

Uploaded by

gguru5749

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Advanced AI and ML 21AI71

Module 4 - Recommender System

4.1 OVERVIEW
• Recommendation systems are a set of algorithms which recommend most relevant
items to users based on their preferences predicted using the algorithms.
• It acts on behavioral data, such as customer’s previous purchase, ratings or reviews to
predict their likelihood of buying a new product or service.
• Amazon’s “Customers who buy this item also bought”, Netflix’s “shows and movies
you may want to watch” are examples of recommendation systems.
• Recommender systems are very popular for recommending products such as movies,
music, news, books, articles, groceries and act as a backbone for cross-selling across
industries.

4.1.1 Datasets
For exploring the algorithms, we will be using the following two publicly available datasets
and build recommendations.
1. groceries.csv: This dataset contains transactions of a grocery store and can be
downloaded from
http://cox.csueastbay.edu/~esuess/stat452/

2. Movie Lens: This dataset contains 20000263 ratings and 465564 tag applications
across 27278 movies. As per the source of data, these data were created by 138493
users between January 09, 1995 and March 31, 2015. This dataset was generated on
October 17, 2016. Users were selected and included randomly. All selected users had
rated at least 20 movies. The dataset can be downloaded from the link
https://grouplens.org/datasets/movielens/

1 Deepak D, Asst. Prof., Dept. of AIML, Canara Engineering College, Mangaluru

Advanced AI and ML 21AI71

4.2 ASSOCIATION RULES (ASSOCIATION RULE MINING)

• Association rule finds combinations of items that frequently occur together in orders
or baskets (in a retail context).
• The items that frequently occur together are called itemsets. Itemsets help to discover
relationships between items that people buy together and use that as a basis for
creating strategies like combining products as combo offer or place products next to
each other in retail shelves to attract customer attention.
• An application of association rule mining is in Market Basket Analysis (MBA).
MBA is a technique used mostly by retailers to find associations between items
purchased by customers.

To illustrate the association rule mining concept, let us consider a set of baskets and the items
in those baskets purchased by customers as depicted in Figure.

Items purchased in different baskets are:

1. Basket 1: egg, beer, sugar, bread, diaper
2. Basket 2: egg, beer, cereal, bread, diaper
3. Basket 3: milk, beer, bread
4. Basket 4: cereal, diaper, bread

• The primary objective of a recommender system is to predict items that a customer

may purchase in the future based on his/her purchases so far.
• In future, if a customer buys beer, can we predict what he/she is most likely to buy
along with beer? To predict this, we need to find out which items have shown a strong
association with beer in previously purchased baskets. We can use association rule
mining technique to find this out.

2 Deepak D, Asst. Prof., Dept. of AIML, Canara Engineering College, Mangaluru

Advanced AI and ML 21AI71

• Association rule considers all possible combination of items in the previous baskets
and computes various measures such as support, confidence, and lift to identify rules
with stronger associations.
• One of the challenges in association rule mining is the number of combination of items
that need to be considered; as the number of unique items sold by the seller increases,
the number of associations can increase exponentially.
• One solution to this problem is to eliminate items that possibly cannot be part of any
itemsets. One such algorithm the association rules use Apriori algorithm.
• The Apriori algorithm was proposed by Agrawal and Srikant (1994).
The rules generated are represented as

which means that customers who purchased diapers also purchased beer in the same
basket. {diaper, beer} together is called itemset. {diaper} is called the antecedent and
the {beer} is called the consequent.

Both antecedents and consequents can have multiple items. The below example is also
a valid rule

4.2.1 Metrics
Concepts such as support, confidence, and lift are used to generate association rules.
1. Support
• Support indicates the frequencies of items appearing together in baskets with respect
to all possible baskets being considered (or in a sample).
• For example, the support for (beer, diaper) will be 2/4 (based on the data shown in
Figure 9.1), that is, 50% as it appears together in 2 baskets out of 4 baskets.

3 Deepak D, Asst. Prof., Dept. of AIML, Canara Engineering College, Mangaluru

Advanced AI and ML 21AI71

2. Confidence
• Confidence measures the proportion of the transactions that contain X, which also
contain Y. X is called antecedent and Y is called consequent.
• Confidence can be calculated using the following formula:

where P(Y|X) is the conditional probability of Y given X.

3. Lift
Lift is calculated using the following formula:

• Lift can be interpreted as the degree of association between two items.

• Lift value 1 indicates that the items are independent (no association), lift value of less
than 1 implies that the products are substitution (purchase one product will decrease
the probability of purchase of the other product) and lift value of greater than 1
indicates purchase of Product X will increase the probability of purchase of Product Y.
• Lift value of greater than 1 is a necessary condition of generating association rules.

4.2.2 Applying Association Rules

To understand and apply association rules using transaction data in groceries.csv. This will
involve loading, encoding, and analysing transaction data to uncover patterns and
associations in customer purchasing behaviors.

all_txns = []
with open('groceries.csv') as f:
content = f.readlines()
txns = [x.strip() for x in content] # Remove whitespace
for each_txn in txns:
all_txns.append(each_txn.split(','))

4 Deepak D, Asst. Prof., Dept. of AIML, Canara Engineering College, Mangaluru

Advanced AI and ML 21AI71

2. Encoding the Transactions

Convert the list of transactions into a one-hot-encoded matrix for easier rule generation.
Library: mlxtend provides OnehotTransactions for this purpose.

import pandas as pd
from mlxtend.preprocessing import OnehotTransactions

one_hot_encoding = OnehotTransactions()
one_hot_txns = one_hot_encoding.fit(all_txns).transform(all_txns)
one_hot_txns_df =
pd.DataFrame(one_hot_txns, columns=one_hot_encoding.columns_)

Matrix Structure: Rows represent transactions; columns represent items, with 1 for purchased
items and 0 otherwise.

3. Generating Association Rules

Use the Apriori algorithm to find frequent itemsets with a specified minimum support
threshold.
Apriori algorithm takes the following parameters:
1. df: pandas − DataFrame in a one-hot-encoded format.
2. min_support: float − A float between 0 and 1 for minimum support of the itemsets
returned. Default is 0.5.
3. use_colnames: boolean − If true, uses the DataFrames’ column names in the returned
DataFrame instead of column indices.

5 Deepak D, Asst. Prof., Dept. of AIML, Canara Engineering College, Mangaluru

Advanced AI and ML 21AI71

We will be using a minimum support of 0.02, that is, the itemset is available in at least 2% of
all transactions.

from mlxtend.frequent_patterns import apriori

frequent_itemsets = apriori(one_hot_txns_df, min_support=0.02,

use_colnames=True)

frequent_itemsets.sample(10, random_state=90)

4. Creating Association Rules

Use association_rules to generate rules from frequent itemsets, with lift as the evaluation
metric.

6 Deepak D, Asst. Prof., Dept. of AIML, Canara Engineering College, Mangaluru

Advanced AI and ML 21AI71

The corresponding association rules are

Let us look at the top 10 association rules sorted by confidence. The rules stored in the
variable rules are sorted by confidence in descending order.

From Table 9.4, we can infer that the probability that a customer buys (whole milk), given
he/she has bought (yogurt, other vegetables), is 0.51.

7 Deepak D, Asst. Prof., Dept. of AIML, Canara Engineering College, Mangaluru

Advanced AI and ML 21AI71

4.3 COLLABORATIVE FILTERING

• Collaborative filtering is based on the notion of similarity (or distance).
• For example, if two users A and B have purchased the same products and have rated
them similarly on a common rating scale, then A and B can be considered similar in
their buying and preference behavior.
• Hence, if A buys a new product and rates high, then that product can be recommended
to B. Alternatively, the products that A has already bought and rated high can be
recommended to B, if not already bought by B.

4.3.1 How to Find Similarity between Users

• Similarity or the distance between users can be computed using the rating the users
have given to the common items purchased.
• If the users are similar, then the similarity measures such as Jaccard coefficient and
cosine similarity will have a value closer to 1 and distance measures such as Euclidian
distance will have low value.
• Example: The picture in Figure 9.2 depicts three users Rahul, Purvi, and Gaurav and
the books they have bought and rated.

8 Deepak D, Asst. Prof., Dept. of AIML, Canara Engineering College, Mangaluru

Advanced AI and ML 21AI71

The users are represented using their rating on the Euclidean space in Figure 9.3. Here the
dimensions are represented by the two books Into Thin Air and Missoula, which are the two
books commonly bought by Rahul, Purvi, and Gaurav.

Figure 9.3 shows that Rahul’s preferences are similar to Purvi’s rather than to Gaurav’s. So,
the other book, Into the Wild, which Rahul has bought and rated high, can now be
recommended to Purvi.

4.3.2 User-Based Similarity

This approach recommends items to a user based on the preferences of similar users. If two
users have rated the same items similarly, they’re considered similar. Therefore, items liked
by one user can be recommended to the other. This similarity is often computed using metrics
like cosine similarity, Pearson correlation, or Jaccard coefficient.

• We will use MovieLens dataset for finding similar users based on common movies the
users have watched and how they have rated those movies.
• The file ratings.csv in the dataset contains ratings given by users. Each line in this file
represents a rating given by a user to a movie.
• The ratings are on the scale of 1 to 5. The dataset has the following features:
1. userId
2. movieId
3. rating
4. timestamp

9 Deepak D, Asst. Prof., Dept. of AIML, Canara Engineering College, Mangaluru

Advanced AI and ML 21AI71

Example Using the MovieLens Dataset

In this example, we use the MovieLens dataset, which provides movie ratings by users, with
each rating recorded in a CSV file. The following steps outline how to perform collaborative
filtering using user-based similarity:

1. Data Preparation:
• Load the dataset and drop unnecessary columns, such as the timestamp.
• Create a pivot table where rows represent users, columns represent movies, and values
are the ratings. This pivot table, which is sparse, has NaNs where users haven’t rated
specific movies. These NaNs are then filled with 0s to facilitate similarity calculations.

Create a pivot table or matrix and represent users as rows and movies as columns. The values
of the matrix will be the ratings the users have given to those movies
Those movies that the users have not watched and rated yet, will be represented as NaN.

10 Deepak D, Asst. Prof., Dept. of AIML, Canara Engineering College, Mangaluru

Advanced AI and ML 21AI71

2. Calculating Cosine Similarity between Users

• Each row in user_movies_df represents a user. If we compute the similarity between
rows, it will represent the similarity between those users.
• sklearn.metrics.pairwise_distances can be used to compute distance between all pairs
of users.
• pairwise_distances() takes a metric parameter for what distance measure to use. We
will be using cosine similarity for finding similarity. Cosine similarity closer to 1
means user are very similar and closer to 0 means users are very dissimilar.

3. Finding Similar Users:

• For each user, the user with the highest similarity score is identified.
• For instance, if user 338 is most similar to user 2 based on a cosine similarity score of
0.58, this means user 338’s ratings are closely aligned with those of user 2.

11 Deepak D, Asst. Prof., Dept. of AIML, Canara Engineering College, Mangaluru

Module 4-1
No ratings yet
Module 4-1
34 pages
MODULE - 4 Advance AIML Part 1
No ratings yet
MODULE - 4 Advance AIML Part 1
12 pages
AIML Mod 4
No ratings yet
AIML Mod 4
37 pages
M4
No ratings yet
M4
58 pages
ML Module3
No ratings yet
ML Module3
83 pages
Unit IV Recommender System
No ratings yet
Unit IV Recommender System
5 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
CS8091 BDA Unit 3
No ratings yet
CS8091 BDA Unit 3
144 pages
CH 5
No ratings yet
CH 5
53 pages
Association
No ratings yet
Association
54 pages
6 - Association Rules - For Students
No ratings yet
6 - Association Rules - For Students
39 pages
Lesson #9
No ratings yet
Lesson #9
18 pages
Arm PPT
No ratings yet
Arm PPT
15 pages
Unit-14 Association Rules
No ratings yet
Unit-14 Association Rules
28 pages
Dataanalytics Unit-4
No ratings yet
Dataanalytics Unit-4
23 pages
CS2202 AssociationRuleMining
No ratings yet
CS2202 AssociationRuleMining
59 pages
Week04 Association Rules and Collaborative Filtering
No ratings yet
Week04 Association Rules and Collaborative Filtering
21 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
88 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
10 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
Association Rule
No ratings yet
Association Rule
17 pages
COS10022 DSP Week06 Association Rules
No ratings yet
COS10022 DSP Week06 Association Rules
52 pages
Data Mining: Association Rules
No ratings yet
Data Mining: Association Rules
43 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Class 4-Associative Analysis
No ratings yet
Class 4-Associative Analysis
42 pages
AI & ML: Association Rule Mining
No ratings yet
AI & ML: Association Rule Mining
46 pages
Big Data Analytics Unit3
No ratings yet
Big Data Analytics Unit3
27 pages
9 Association
No ratings yet
9 Association
56 pages
BANA 560 Lecture 6 Association Rules Collaborative Filtering
No ratings yet
BANA 560 Lecture 6 Association Rules Collaborative Filtering
34 pages
Retail Data Insights
No ratings yet
Retail Data Insights
12 pages
AIML Presentation
No ratings yet
AIML Presentation
21 pages
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
100% (1)
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
15 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
CS8091 - Big Data Analytics - Unit 3
No ratings yet
CS8091 - Big Data Analytics - Unit 3
26 pages
Unit 2
No ratings yet
Unit 2
14 pages
Unit - III
No ratings yet
Unit - III
38 pages
Data Mining for Computer Science Students
No ratings yet
Data Mining for Computer Science Students
36 pages
Unit - V Part-1
No ratings yet
Unit - V Part-1
43 pages
Unit 2
No ratings yet
Unit 2
14 pages
Predicting Shopping Cart Items
No ratings yet
Predicting Shopping Cart Items
7 pages
Association Rules
No ratings yet
Association Rules
39 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
Advanced AIML: Association Rules
No ratings yet
Advanced AIML: Association Rules
11 pages
DM Association
No ratings yet
DM Association
43 pages
Slides03 - Items and Association
No ratings yet
Slides03 - Items and Association
17 pages
Retail Data Insights & Strategies
No ratings yet
Retail Data Insights & Strategies
24 pages
Lec 2
No ratings yet
Lec 2
18 pages
Overview of the Apriori Algorithm
No ratings yet
Overview of the Apriori Algorithm
55 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
Canara Engineering College Project Expo 2024
No ratings yet
Canara Engineering College Project Expo 2024
1 page
Patent Non-Disclosure Form Template
No ratings yet
Patent Non-Disclosure Form Template
4 pages
ERP Event Status Update
No ratings yet
ERP Event Status Update
5 pages
NTM - Question Bank-3
No ratings yet
NTM - Question Bank-3
1 page
Procedure For TPM .Maintenance
No ratings yet
Procedure For TPM .Maintenance
4 pages
APJ Abdul Kalam Detailed Biography
No ratings yet
APJ Abdul Kalam Detailed Biography
14 pages
Associate-Google-Workspace-Administrator (77 Questions)
No ratings yet
Associate-Google-Workspace-Administrator (77 Questions)
6 pages
Mindspark'24 - Rule Book Final
No ratings yet
Mindspark'24 - Rule Book Final
138 pages
Chana CM5 1.3L
No ratings yet
Chana CM5 1.3L
11 pages
Lecture Notes On Polya's Problem Solving Strategy
No ratings yet
Lecture Notes On Polya's Problem Solving Strategy
20 pages
Matrix of Curriculum Standards (Competencies), With Corresponding Recommended Flexible Learning Delivery Mode and Materials Per Grading Period
No ratings yet
Matrix of Curriculum Standards (Competencies), With Corresponding Recommended Flexible Learning Delivery Mode and Materials Per Grading Period
3 pages
MSEC Electronics Engineering Syllabus 2024
No ratings yet
MSEC Electronics Engineering Syllabus 2024
68 pages
XXX Korea XXX XNXX Sex Videos 57015
0% (1)
XXX Korea XXX XNXX Sex Videos 57015
4 pages
Forrester The Total Economic Impact of Talend
No ratings yet
Forrester The Total Economic Impact of Talend
32 pages
Computer Awareness GK (CPCT Exam)
No ratings yet
Computer Awareness GK (CPCT Exam)
32 pages
HowToUpgradeOrRestore8100V5 6
No ratings yet
HowToUpgradeOrRestore8100V5 6
8 pages
User-Manual NJoy UserManual UPS ISIS For View
No ratings yet
User-Manual NJoy UserManual UPS ISIS For View
27 pages
Data Center Overview and Design Considerations
100% (1)
Data Center Overview and Design Considerations
20 pages
How It Works (Part 8) - Multitrack (SOS Mar '88)
No ratings yet
How It Works (Part 8) - Multitrack (SOS Mar '88)
10 pages
02 - Decision Constructs Loops
No ratings yet
02 - Decision Constructs Loops
45 pages
Autofocus Test Chart
No ratings yet
Autofocus Test Chart
31 pages
Physics Target Full Class 12th
91% (11)
Physics Target Full Class 12th
348 pages
NCMF
No ratings yet
NCMF
65 pages
Vikas Kumar
No ratings yet
Vikas Kumar
1 page
Guide For Writing Requirements
100% (1)
Guide For Writing Requirements
110 pages
Mixed Quiz 5
No ratings yet
Mixed Quiz 5
3 pages
Eng G9
No ratings yet
Eng G9
5 pages
Encyclopedia of Data Warehousing and Mining 1st John Wang PDF Download
100% (1)
Encyclopedia of Data Warehousing and Mining 1st John Wang PDF Download
77 pages
Firefly-RK3399 V10 SCH (2017-2-8)
No ratings yet
Firefly-RK3399 V10 SCH (2017-2-8)
30 pages
Accenture HMO Enrollment Consent Form
No ratings yet
Accenture HMO Enrollment Consent Form
2 pages
Teleportation Paper
No ratings yet
Teleportation Paper
2 pages
Spe-35189-Ms Folger Paper
No ratings yet
Spe-35189-Ms Folger Paper
13 pages
JPOS Version 1 6
100% (2)
JPOS Version 1 6
802 pages
SmartSafe EB480 EV Battery Cell Equalizer User's Manual
No ratings yet
SmartSafe EB480 EV Battery Cell Equalizer User's Manual
24 pages