0% found this document useful (0 votes)

75 views6 pages

Restaurant Recommendation Model Analysis

The document describes a restaurant recommendation system that aims to predict whether a user will have a positive or negative review for a given restaurant. It develops features based on user data, restaurant data, and the relationship between users and restaurants. These include raw data features, aggregated historical ratings for users and businesses, ratings based on matching categories between users and restaurants, and social network data. The top predictive features are identified through analysis and include average historical business ratings and ratings from friends.

Uploaded by

Raghav Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views6 pages

Restaurant Recommendation Model Analysis

Uploaded by

Raghav Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

Restaurant Recommendation System

addition, we keep N months of data from the period ending

6/15/2014 as training data, where N is a variable to our
INTRODUCTION learning algorithm. The remaining data from the older period
serves a source for generating historical features. Here’s a
Local business review websites such as Yelp and Urbanspoon depiction of how the data is split based on time period:
are a very popular destination for a large number of people for
deciding on their eat-outs. Being able to recommend local
businesses to users is a functionality that would be a very
valuable addition to these sites’s functionality. In this paper we
aim to build a model that recommends restaurants to users. The Everything else N months Last 1 month
way we will model this is by predicting whether a user will
have a positive or a negative review for the business. We will
restrict to restaurants segment within the business category as Derived Historical features Training Set Test Set
recommendation is a very good fit in that system. One way this
model could be used in practice is by having an automatic The way the problem is now modeled is to learn from
‘Recommend: Yes/No’ message when a user visits a current data to make predictions in the future.
restaurant’s profile page. The way the problem is modeled is to
Yelp users give ratings on a 5-point scale, which are mapped
be able to predict yes/no for any given restaurant and user.
to a binary yes (4,5)/no(1,2,3) label. Hence, each example in
our training/test data is a single review with a binary label.
In this work, we will primarily explore the following Roughly, about 65% of labels are yes and 35% are no which
directions: 1) Optimization algorithms to predict the desired means that we can achieve a trivial baseline accuracy of 65%
label 2) Develop features that would help improve the by predicting everything as yes.
accuracy of this model.

PROBLEM STATEMENT METHODOLOGY

There are systems that exist today that recommend users We will first describe the features being used, and the
restaurants, but none of them model the problem in this way features that we developed to solve this problem. Given that
to predict a yes/no given a user and a restaurant. To my our input tuple is <user, restaurant> I have features of
knowledge, this is the first solution that attempts to following categories:
recommend a Yes/No given a user and a local business. One a) User-level features
assumption we make in this work is that the reviews data is b) Restaurant-level features
not biased by the label i.e. the majority of users are c) User-Restaurant features
uniformly writing reviews for restaurants they visit, and not
because of their good or bad experience. Summary of features

We also classify the features into the following buckets:

1. Raw features
DATA COLLECTION We have 407 raw features from the data itself,
comprising of 5 user-level features and 402 restaurant-
The data that we used in this project was obtained from the level features. User level features are such as number of
Yelp Dataset challenge. The dataset contains five different days in yelp, number of fans, number of votes etc.
tables: User, Business, Review, Check-In and Tips. The data Restaurant level features are such as binary features for
has 14092 restaurants, 252395 users, 402707 tips and 1124955 attributes (parking, take out, etc) and categories
reviews. The reviews span over 10 years of data.
(cuisine)
We hold out 1 month of reviews data as the test data, which
contains 22375 reviews from 6/16/2014 to 7/16/2014. In
2. Derived/Computed features Step 2: In the training and test data, we compute matching
As described in the previous section, we hold out features comparing user’s preference and the business
majority of the old reviews data for computation of categories. E.g. the best feature in this category was the
features. This old period is prior and not overlapping to average rating for this user averaged over all categories that
the periods from which we sample training and test data. matched the given business’s categories.
We compute 16 derived features from this holdout
‘historical’ period that are described in more detail in The intuition behind these features is that user’s personal
this section. These consist of B) user-level features such preference on certain categories of restaurants should be a
as average user historical rating and business-level strong signal to whether a user would like a future restaurant.
features such as average business-level rating, number
of reviews. Also includes C) user-category features such
as average rating from the user on that category given D. Collaborative Features
the current restaurant’s category and D) features from
user’s social network with friends’ preferences. The publicly available dataset also provided each user’s
social graph, i.e. the users’ friends. Using the intuition that a
Significant amount of work went into engineering these user’s friends’ likings are good representatives of a user’s
features, trying different ways to compute them. A lot of likings, we developed the following feature: Given a
improvements in the results came from the iterative creation business and user in the training/test set, average rating for
of new features. We’ll next go into the details of the this business from this user’s friends in the historical period.
features, and then we’ll summarize the results of adding As before, we used suitable variations for default values
these features. when the feature was missing a value.

A. Raw features

From the raw data we had five user-level features: number of

fans, number of days on yelp and three different vote counts. UNDERSTANDING THE FEATURES
There are 61 business attribute features such as binary
information about ambience, diet and facilities. There are Before delving into training a model, we want to analyze the
233 binary features about cuisine, style of restaurant. There features first. We used a simple measure such as F-Score to
are 108 binary features for the city in which this business is. identify the top features in our data. Here are the top 10
features based on F-Score on a 1 month training set:

B. Historical user and business aggregated features Index Feature FScore

1 business_averageBusinessRatingWithDefault 0.1011
The first set of features we implemented were based on
2 business_averageBusinessRating 0.0328
intuition that a business is likely to receive ratings
correspond to their historical ratings: 3 userbus_averageRatingForAttrMatchWithDefault 0.0123
User-level: Average historical rating from this user, # of 4 userbus_averageRatingForAttrMatchWithDefaultW 0.0121
reviews
5 user_averageUserRatingWithDefault 0.0120
Business-level: Average historical rating for this business, #
of reviews 6 userbus_averageRatingForCatMatchWithDefaultW 0.0120
Missing features – Since historical data for certain 7 userbus_averageRatingForCatMatchWithDefault 0.0113
users/business can be missing, we circumvented this by
8 avgfriendratingonthisbusinessD 0.0063
using some variations of this feature with default values
ranging from min to average to max. Using a default seems 9 business_attributes.[Link] 0.0039
to have helped across the board as we give the algorithm a 10 business_categories_Buffets 0.0035
way to treat missing values differently than just zero.
11 business_attributes.[Link] 0.0033
12 business_attributes.Caters 0.0025
C. User-business category based affinity 13 business_categories_Fast Food 0.0024
14 business_attributes.[Link] 0.0024
In order to improve the accuracy further, we decided to
implement features that model each user’s personal others …..
preference. These features are computed as follows:
Step 1: Compute each user’s personal preference on each of Historical user and business aggregated features: The top
the possible business categories and attributes. This is feature business_average Business Rating is the computed
computed as the average rating a user gave to each of the average rating from the historic hold out period.
business categories in the historical period. One such feature business_average Business Rating with Default was normalized
is avg_rating_for_thaicuisine_for_this_user. to always have a default value even if historical data is missing
for that business (the default value we use is the average rating
from all reviews).
user_average User Rating with Default is the average rating Table 1: SVM with RBF Kernel with Feature Set 3
for that user with default.
# training Training Training Accuracy Test Accuracy
data Data Size
User-business category based affinity: The feature
1 week 5000 97.88% 66.12%
User bus average Rating for Attr Match with Default 1 month 20000 95.10% 65.50%
measures the affinity of a user with a business based on 2 month 40000 92.88% 65.88%
historical ratings on its categories. 4 month 80000 90.18% 66.22%

Collaborative Features: user_average User Rating with

Default is the average rating for this restaurant from the The learning curve looks like follows:
user’s friends.

The remaining features on the list are binary features on Learning Curve with SVM: RBF Kernel
business categories and their names are self-explanatory. 100%
The top 8 performing features are all derived features
described in the previous section, with the top feature being 90%
historical average rating of the given restaurant. 80%

70%

60%
MODELING THE PROBLEM 1 week 1 month 2 month 4 month

In this section we will describe different algorithms we used TrainingAccuracy TestingAccuracy

varying parameters such as size of training data, subset of
features and evaluate their performance. *x-axis corresponds to size of training set
We experimented with a few different algorithms, variations
in training data, as well as features. We’ll describe the From the learning curve, it’s clear that some over-fitting is
results from each of them separately. Before proceeding happening with lesser training data, and adding more
we’ll define the feature set I used in the results: training data helps that. However, even with 4 months’
Feature Set 1: Consists only of the raw features defined in worth of training data we see that the testing accuracy only
III.A section marginally improves.
Feature Set 2: Consists of the raw and derived features
defined in III.A and B sections, i.e. this includes historical Reducing over-fitting:
average ratings per user and business and simple review The effect of high over-fitting is likely arising from the high
stats. Feature Set 3: Consists of all the raw and derived dimensional feature mapping from the RBF kernel. We
features defined in III.A, B, C and D sections. This includes iterated on some regularization methods to achieve
all the features previously mentioned, including user significantly better results (optimized gamma and C using a
category affinity and collaborative features. parameter sweep with cross validation).

Table 2: Regularized SVM with RBF Kernel w/ Feature Set 3

A. Learning Algorithms Used
# training Training Training Accuracy Test Accuracy
We experimented with the following algorithms to train the data Data Size
1 week 5000 73.10% 68.59%
rating predictor classifier:
- SVM with RBF Kernel 1 month 20000 70.78% 69.41%
- Linear SVM 2 month 40000 70.38% 69.33%
- Logistic regression 4 month 80000 70.16% 69.54%

A.1. SVM with RBF Kernel This clearly has lesser over-fitting and better test accuracy.
Our first approach was to train an SVM classifier using the
radial basis function kernel. Since the amount of training A.2. Logistic regression
data needed at this point is not clear, I vary the training data I applied the same training and test data with Feature set 3
size with reviews period ranking from 1 week, 1 month, 2 with logistic regression, and the results were as follows:
month to 4 months (the test data remains unchanged).
We measure accuracy defined as the percentage of reviews Table 3: Logistic regression w/ Feature Set 3
where we predicted a positive or negative review correctly.
# training Training Training Accuracy Test Accuracy
The table below shows the change in performance of the data Data Size
model with varying training data size with Feature Set 3. 1 week 5000 71.97% 69.10%
1 month 20000 69.93% 69.55%
2 month 40000 69.43% 69.68%
4 month 80000 69.40% 69.28%
A.3. SVM with Linear Kernel features and it consistently performs better than Feature set 2
We observed that given the vast feature set we have high for all varying training data, although only marginally better.
dimensional kernel for SVM did not add a lot of value. In
fact, there was over-fitting in the high dimensional space C. Impact of varying training data size
until significant regularization was added. We experimented
with SVM with a linear kernel which reduced over-fitting We see interesting results with varying training data size in
and produced comparable and even slightly better results as Table 5.
shown in the table below. Specifically, we see training accuracy go down with increase
in training data size. This indicates a good reduction in
Table 4: SVM with linear kernel variance in that the over-fitting problem is fixed with
increased training data.
# training Training Training Accuracy Test Accuracy
data Data Size
To analyze testing accuracy better, let’s present a zoomed-in
1/2 week 3000 73.02% 68.01% learning curve plot only for testing accuracy on feature set 3
1 week 5000 72.31% 68.94%
with linear SVM:
1 month 20000 70.10% 69.89%
2 month 40000 69.73% 69.77%
Testing Accuracy with training data size
4 month 80000 69.63% 69.49%

69.9%
69.4%
Here’s the learning curve for linear SVM:
68.9%
68.4%
Learning curve with linear SVM 67.9%
1/2 week 1 week 1 month 2 month 4 month 8 months
73%
Test Accuracy - Feature Set 3
71%

69%
We see test accuracy increase with increased training data
67% until about 1 month of training data, but it reduces with
1/2 week 1 week 1 month 2 month 4 month significantly increased training data such as 4 or 8 months.
We explain this behavior with the following hypotheses. The
TrainingAccuracy TestingAccuracy
way we sample training data size is not random, rather
increase in training data is done through going back more in
As we see, we achieve comparable training and test accuracy time and holding out more old data for training. This also
with large enough training data looking at periods of 1 means that the historical hold out period from which derived
month or more. There’s little to no over-fitting happening at features are computed also gets older with increase in
this stage, and this is the maximum we can learn from the training set size. We summarize the hypothesis here:
given set of features and training data. o Recent training data is more representative of reviews in
the upcoming period than older training data.
B. Impact of derived feature sets o Derived features from recent period are stronger signals
for predicting reviews in the upcoming period.
The following table compares the testing accuracies with
different feature sets with varying training data. Thus it is important in such machine learning algorithms
when using past data to predict the future results to optimize
Table 5: Testing Accuracy with different feature sets with linear on varying holding out data for features and training data.
SVM: We probably want to experiment with weighing training data
# training Feature Set 1 Feature Set 2 Feature Set 3 based on its age.
data (raw only) (all derived)
1/2 week 64.90% 67.91% 68.03%
1 week 65.68% 68.81% 68.91%
1 month 66.97% 69.79% 69.89%
SUMMARY OF RESULTS
2 month 66.97% 69.77% 69.82%
4 month 67.09% 69.45% 69.50% We summarize the results from the previous section as
8 month - 69.30% 69.42% follows:
1. We see comparable results from the different algorithms
The table clearly shows the superior results from using the that were used although linear SVM was least susceptible
derived features. Feature Set 3 is the set of all raw and derived to over-fitting and performed marginally better. We
achieved a testing accuracy of 69.89% with linear SVM,
feature set 3 and using 1 month of reviews as training problem. At this stage, it’s not clear as I have not yet
data. explored all the possible features, but it is a concern to
me.
2. We see a significant improvement from derived
features, specifically from using the following: 2. Unclear bias in reviews used for training and evaluation:
a. Historical average ratings for the business One assumption we make is that a user’s decision to
b. Affinity of user to a specific business category provide reviews to a restaurant is random, and not
c. Collaborative features biased by an unusually good or bad experience one has.

3. Increased training data reduced over-fitting, but there’s Future work would involve trying to identify stronger features
value in weighing training data based on the age of the beyond what is available in the datasets, as well as investing in
label. Recent data is more useful in learning than older an approach to gather training and evaluation data from
data. alternate means (such as explicit human judgment systems).

4. It was important to treat missing feature values

differently than zero by providing variations to the REFERENCES
model to learn from.
[1] Yelp dataset: [Link]
5. At the end, we perform about +5% in accuracy better [2] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support
than the trivial baseline of always predicting yes. vector machines. ACM Transactions on Intelligent Systems and
Technology, [Link]--27:27, 2011. Software available at
[Link]

FUTURE WORK
We acknowledge that the problem being solved is hard,
specifically because of the following reasons:

1. Unclear Predictability of reviews: Any supervised learning

problem aims to learn from the labels, given the provided
features. The underlying assumption I made here is that the
features we have access to are sufficient to predict a
positive/negative review. However, one can imagine that a
future review can depend highly on the experience the user
has at the restaurant that is not captured anywhere in the
features. This could cause correlation between the features
and the label to be lesser than what would be ideal for a
supervised learning

Ashish Gandhe, Restaurant Recommendation System
No ratings yet
Ashish Gandhe, Restaurant Recommendation System
5 pages
Restaurant Recommendation Model Analysis
No ratings yet
Restaurant Recommendation Model Analysis
5 pages
Automated Restaurant Review Analysis
No ratings yet
Automated Restaurant Review Analysis
2 pages
Personalized Restaurant Recommendation System
No ratings yet
Personalized Restaurant Recommendation System
7 pages
DA Report PDF
No ratings yet
DA Report PDF
4 pages
Project Report
No ratings yet
Project Report
16 pages
Restaurant Rating Prediction Study
No ratings yet
Restaurant Rating Prediction Study
4 pages
Restaurant Recommender for Foodies
No ratings yet
Restaurant Recommender for Foodies
34 pages
Predicting Yelp Restaurant Ratings
No ratings yet
Predicting Yelp Restaurant Ratings
10 pages
Restaurant Recommendation1
No ratings yet
Restaurant Recommendation1
5 pages
Personalized Restaurant Finder
No ratings yet
Personalized Restaurant Finder
5 pages
Data Mining Restaurant Reviews with WEKA
No ratings yet
Data Mining Restaurant Reviews with WEKA
4 pages
Restaurant Review Predictionusing Machine Learning and Neural Network
No ratings yet
Restaurant Review Predictionusing Machine Learning and Neural Network
5 pages
Restaurant Rating Prediction Using ML
No ratings yet
Restaurant Rating Prediction Using ML
4 pages
Restaurant Recommandation System
No ratings yet
Restaurant Recommandation System
15 pages
Yelp Business Rating Prediction Guide
No ratings yet
Yelp Business Rating Prediction Guide
8 pages
Restaurant Review Classification and Recommender System
No ratings yet
Restaurant Review Classification and Recommender System
5 pages
RuiJian MastersThesis
No ratings yet
RuiJian MastersThesis
71 pages
DA - Project 1
No ratings yet
DA - Project 1
12 pages
Modern NLP in Python
No ratings yet
Modern NLP in Python
46 pages
Using A Language Model in A Kiosk Recommender Syst
No ratings yet
Using A Language Model in A Kiosk Recommender Syst
6 pages
Capstone Project Data Science
No ratings yet
Capstone Project Data Science
5 pages
Final Thesis Presentation
No ratings yet
Final Thesis Presentation
22 pages
Restaurant Review
No ratings yet
Restaurant Review
21 pages
Popularity-Based and Collaborative Filtering Based Restaurant Recommender System
No ratings yet
Popularity-Based and Collaborative Filtering Based Restaurant Recommender System
19 pages
Bangalore Restaurant Insights
No ratings yet
Bangalore Restaurant Insights
3 pages
Sentimental Analysis of Resturant Reviews
No ratings yet
Sentimental Analysis of Resturant Reviews
30 pages
Restaurant Rating Project GabrielAmao
No ratings yet
Restaurant Rating Project GabrielAmao
30 pages
Restaurant Image Classification Using CNN
No ratings yet
Restaurant Image Classification Using CNN
9 pages
A Recommendation System For Food Tourism
No ratings yet
A Recommendation System For Food Tourism
10 pages
Swiggy Growth
No ratings yet
Swiggy Growth
4 pages
Restaurant Revenue Prediction Using Machine Learning
No ratings yet
Restaurant Revenue Prediction Using Machine Learning
4 pages
.PPTX 20240625 124547 0000
No ratings yet
.PPTX 20240625 124547 0000
16 pages
Data Visualization 2
No ratings yet
Data Visualization 2
12 pages
Zomato Bangalore Data Analysis Insights
No ratings yet
Zomato Bangalore Data Analysis Insights
11 pages
Yelp vs Zomato Restaurant Analysis
No ratings yet
Yelp vs Zomato Restaurant Analysis
8 pages
Adaptive Food Suggestion Engine by Fuzzy Logic
No ratings yet
Adaptive Food Suggestion Engine by Fuzzy Logic
6 pages
Kshama Report
No ratings yet
Kshama Report
25 pages
Rit 39
No ratings yet
Rit 39
19 pages
Rating Prediction Based On Yelp's User Reviews: A Hybrid Approach
No ratings yet
Rating Prediction Based On Yelp's User Reviews: A Hybrid Approach
10 pages
1 s2.0 S095741742402400X Main
No ratings yet
1 s2.0 S095741742402400X Main
17 pages
Lit 1 F
No ratings yet
Lit 1 F
7 pages
Restaurant Rating Prediction Report Part1
No ratings yet
Restaurant Rating Prediction Report Part1
2 pages
Location-Based Service With Context Data For A Restaurant Recommendation
No ratings yet
Location-Based Service With Context Data For A Restaurant Recommendation
9 pages
F 14
No ratings yet
F 14
3 pages
ML Models on Yelp Data
No ratings yet
ML Models on Yelp Data
16 pages
10 1109@icasert 2019 8934655
No ratings yet
10 1109@icasert 2019 8934655
6 pages
Wide and Deep Learning For Recommender Systems - Google Play Store - Highlighted Paper
No ratings yet
Wide and Deep Learning For Recommender Systems - Google Play Store - Highlighted Paper
4 pages
Application of Data Mining Techniqu
No ratings yet
Application of Data Mining Techniqu
16 pages
Apssdc Edunet
No ratings yet
Apssdc Edunet
11 pages
TeamMess M#
No ratings yet
TeamMess M#
15 pages
Sentiment Analysis of Restaurant Reviews
No ratings yet
Sentiment Analysis of Restaurant Reviews
26 pages
Competitor Analysis via K-Medoids
No ratings yet
Competitor Analysis via K-Medoids
7 pages
Restaurant Review Insights
No ratings yet
Restaurant Review Insights
14 pages
Hotel Recommender System
No ratings yet
Hotel Recommender System
10 pages
IAJSE1014
No ratings yet
IAJSE1014
8 pages
Biomed Presentation Jellies & Creams-2
100% (1)
Biomed Presentation Jellies & Creams-2
15 pages
Electrical Shock Hazards and Safety Codes For Electromedical Equipments
No ratings yet
Electrical Shock Hazards and Safety Codes For Electromedical Equipments
19 pages
Saety of Medical Equipment
No ratings yet
Saety of Medical Equipment
1 page
Sprocket Central Pty LTD: Data Analytics Approach
No ratings yet
Sprocket Central Pty LTD: Data Analytics Approach
7 pages
IDEO's Collaborative Help Culture
No ratings yet
IDEO's Collaborative Help Culture
8 pages
Small Businesses Hit by Trade War
No ratings yet
Small Businesses Hit by Trade War
2 pages
Raghav Gupta: Mohali
No ratings yet
Raghav Gupta: Mohali
2 pages
Tableau Sales Insights Project Overview
No ratings yet
Tableau Sales Insights Project Overview
1 page
Experiment 1 (A)
No ratings yet
Experiment 1 (A)
10 pages
Experiment: To Study AC Transients in RC, RL and RLC Circuits
No ratings yet
Experiment: To Study AC Transients in RC, RL and RLC Circuits
4 pages
Gender Inequality in India: Key Issues
No ratings yet
Gender Inequality in India: Key Issues
14 pages
SQL
No ratings yet
SQL
15 pages
B) Traffic Light Controller Interfacing With: 8086 Trainer Kit
No ratings yet
B) Traffic Light Controller Interfacing With: 8086 Trainer Kit
2 pages
1 History of Microprocessor
No ratings yet
1 History of Microprocessor
80 pages
Electric Geyser Star Rating Guide
No ratings yet
Electric Geyser Star Rating Guide
3 pages
Unethical Practice
No ratings yet
Unethical Practice
14 pages
US-China Trade War: 'We're All Paying For This' Case Study 1
No ratings yet
US-China Trade War: 'We're All Paying For This' Case Study 1
2 pages
Business Ethics: Business Ethics Is A Form of Applied Ethics or Professional
No ratings yet
Business Ethics: Business Ethics Is A Form of Applied Ethics or Professional
10 pages
Voltage Regulator Specs
No ratings yet
Voltage Regulator Specs
3 pages
Raghav Gupta's Academic and Career Profile
No ratings yet
Raghav Gupta's Academic and Career Profile
2 pages
The US-China "Trade War": The War Nobody Can Win: Bahriyilmaz@ku - Edu.tr
No ratings yet
The US-China "Trade War": The War Nobody Can Win: Bahriyilmaz@ku - Edu.tr
1 page
India Not Top Beneficiary of Trade War
No ratings yet
India Not Top Beneficiary of Trade War
1 page
SQL
No ratings yet
SQL
15 pages
CL 1
No ratings yet
CL 1
1 page
Experiment: To Study AC Transients in RC, RL and RLC Circuits
No ratings yet
Experiment: To Study AC Transients in RC, RL and RLC Circuits
4 pages
SCR
100% (4)
SCR
30 pages
ML BIT Ans
No ratings yet
ML BIT Ans
5 pages
Linear Regression and SVM Concepts
No ratings yet
Linear Regression and SVM Concepts
8 pages
100 Days of ML Code Journey
100% (1)
100 Days of ML Code Journey
15 pages
12 Support Vector Machines PDF
No ratings yet
12 Support Vector Machines PDF
11 pages
Machine Learning Beginner's Guide
No ratings yet
Machine Learning Beginner's Guide
57 pages
Introduction To Kernel PCA
No ratings yet
Introduction To Kernel PCA
1 page
R Packages for Machine Learning
No ratings yet
R Packages for Machine Learning
3 pages
MLT Notes
No ratings yet
MLT Notes
17 pages
Report Kernel Pca Method
No ratings yet
Report Kernel Pca Method
11 pages
Huawei Assignment 1
No ratings yet
Huawei Assignment 1
20 pages
Reading Faces, Recommending Choices ASystematicReviewof Facial Emotion Recognition and RecommendationSystems
No ratings yet
Reading Faces, Recommending Choices ASystematicReviewof Facial Emotion Recognition and RecommendationSystems
12 pages
Machine Learning
No ratings yet
Machine Learning
133 pages
Data Mining & Warehousing Exam
No ratings yet
Data Mining & Warehousing Exam
28 pages
Gaussian Process Models for Human Pose
No ratings yet
Gaussian Process Models for Human Pose
9 pages
Predicting Young's Modulus of Linear Polyurethane and Polyurethane-Polyurea Elastomers
No ratings yet
Predicting Young's Modulus of Linear Polyurethane and Polyurethane-Polyurea Elastomers
14 pages
CS 601 ML Lab Manual
0% (1)
CS 601 ML Lab Manual
14 pages
Predicting Strength of Foamed Concrete with SVM
No ratings yet
Predicting Strength of Foamed Concrete with SVM
8 pages
ML Notes
100% (2)
ML Notes
125 pages
Non Linear 1704955560
No ratings yet
Non Linear 1704955560
50 pages
Support Vector Machines and Kernels
No ratings yet
Support Vector Machines and Kernels
23 pages
Machine Learning - Wikipedia
No ratings yet
Machine Learning - Wikipedia
36 pages
Mini Project On Machine Learning: Support Vector Machine
No ratings yet
Mini Project On Machine Learning: Support Vector Machine
8 pages
Hybrid Swarm Algorithm for Glaucoma Diagnosis
No ratings yet
Hybrid Swarm Algorithm for Glaucoma Diagnosis
12 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
28 pages
Math Behind SVM (Kernel Trick) - This Is PART III of SVM Series - by MLMath - Io - Medium
No ratings yet
Math Behind SVM (Kernel Trick) - This Is PART III of SVM Series - by MLMath - Io - Medium
6 pages
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
No ratings yet
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
153 pages
MOOC Dropout Prediction via Deep Learning
No ratings yet
MOOC Dropout Prediction via Deep Learning
24 pages
Answer 1722791857 NLP and Classification Practical MCQ 4991
No ratings yet
Answer 1722791857 NLP and Classification Practical MCQ 4991
26 pages
Diploma in Data Science: Integrating AI, Mathematics, Python, and Machine Learning
No ratings yet
Diploma in Data Science: Integrating AI, Mathematics, Python, and Machine Learning
12 pages
AI-Enhanced House Price Estimation
No ratings yet
AI-Enhanced House Price Estimation
7 pages

Restaurant Recommendation Model Analysis

Uploaded by

Restaurant Recommendation Model Analysis

Uploaded by

Restaurant Recommendation System

addition, we keep N months of data from the period ending

PROBLEM STATEMENT METHODOLOGY

We also classify the features into the following buckets:

From the raw data we had five user-level features: number of

B. Historical user and business aggregated features Index Feature FScore

Collaborative Features: user_average User Rating with

In this section we will describe different algorithms we used TrainingAccuracy TestingAccuracy

Table 2: Regularized SVM with RBF Kernel w/ Feature Set 3

4. It was important to treat missing feature values

1. Unclear Predictability of reviews: Any supervised learning

You might also like