0% found this document useful (0 votes)

65 views11 pages

Zomato Rating Prediction

The document outlines a mini-project focused on predicting Zomato restaurant ratings using a dataset containing 51,717 entries and 17 features. It details the steps taken for data cleaning, including handling missing values, removing unnecessary columns, and converting data types for analysis. The project aims to prepare the dataset for further analysis and modeling to predict restaurant ratings.

Uploaded by

kanadeshubhu04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views11 pages

Zomato Rating Prediction

Uploaded by

kanadeshubhu04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

NAME : Kanade Shubhada Sanjay

ROLL NO. : 65
DIV : A

MINI-PROJECT
Zomato-rating-prediction

1. Importing the libraires

In [1]: import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

1.1 Loading the dataset

In [2]: data = pd.read_csv('../input/zomato-bangalore-restaurants/zomato.csv')

In [3]: data
Out[3]:
url address name online_order book_t

942, 21st Main

https://www.zomato.com/bangalore/jalsa- Road, 2nd
0 Stage, Jalsa Yes
banasha...
Banashankari,
...

https://www.zomato.com/bangalore/spice- Spice
Elephant

1112, Next to
https://www.zomato.com/SanchurroBangalore? KIMS Medical San Churro
2 Yes
cont... College, 17th Cafe
Cross...

Addhuri
https://www.zomato.com/bangalore/addhuri-
Udupi
Bhojana

10, 3rd Floor,

https://www.zomato.com/bangalore/grand- Lakshmi Grand
4 No
village... Associates, Village
Gandhi Baza...

... ... ... ... ...

Best Brews
Four Points by
- Four
https://www.zomato.com/bangalore/best- Sheraton
51712 Points by No
brews-fo... Bengaluru,
Sheraton
43/3, White...
Bengaluru...

https://www.zomato.com/bangalore/vinod-
Palya, And
bar-and...
Mahadevapura, Restaurant

Plunge -
Sheraton
Sheraton
Grand
https://www.zomato.com/bangalore/plunge- Grand
51714 Bengaluru No
sherat... Bengaluru
Whitefield
Whitefield
Hotel & Co...
H...

Grand
Bengaluru
Whitefield
url address name online_order book_t

ITPL Main
Road, KIADB The Nest -
https://www.zomato.com/bangalore/the-nest-
51716 Export The Den No
the-...
Promotion Bengaluru
Industr...

51717 rows × 17 columns

1.2 checking the shape of dataset

In [4]: data.shape

Out[4]: (51717, 17)

there are total 51717 samples with 17 features.

In [5]: data.columns

Out[5]: Index(['url', 'address', 'name', 'online_order', 'book_table', 'rate', 'votes',

'phone', 'location', 'rest_type', 'dish_liked', 'cuisines',
'approx_cost(for two people)', 'reviews_list', 'menu_item',
'listed_in(type)', 'listed_in(city)'],
dtype='object')

1.3 checking the datatypes

In [6]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51717 entries, 0 to 51716
Data columns (total 17 columns):
# Column Non-Null Count Dtype

0 url 51717 non-null object

1 address 51717 non-null object
2 name 51717 non-null object
3 online_order 51717 non-null object
4 book_table 51717 non-null object
5 rate 43942 non-null object
6 votes 51717 non-null int64
7 phone 50509 non-null object
8 location 51696 non-null object
9 rest_type 51490 non-null object
10 dish_liked 23639 non-null object
11 cuisines 51672 non-null object
12 approx_cost(for two people) 51371 non-null object
13 reviews_list 51717 non-null object
14 menu_item 51717 non-null object
15 listed_in(type) 51717 non-null object
16 listed_in(city) 51717 non-null object
dtypes: int64(1), object(16)
memory usage: 6.7+ MB

there are so many object type columns, we have to convert them into numeric type. letter we will
convert oject dtype to numeric type
2. Data Cleaning
2.1 checking the missing values

In [7]: data.isnull().sum()

Out[7]: url 0
address 0
name 0
online_order 0
book_table 0
rate 7775
votes 0
phone 1208
location 21
rest_type 227
dish_liked 28078
cuisines 45
approx_cost(for two people) 346
reviews_list 0
menu_item 0
listed_in(type) 0
listed_in(city) 0
dtype: int64

there are so many null values.we can clearly see that in the 'rate', 'phone', 'location', 'rest_type',
'dish_liked', 'cuisines' and 'approx_cost(for two people)' these columns have missing values.So
firstly we have to handle the missing values.

2.2 Removing the unnecessary columns form data

In [8]: df = data.drop(['url', 'phone'], axis = 1) # dropped 'url' and 'phone' columns

In [9]: df.head()
Out[9]:
address name online_order book_table rate votes location rest_type dish_lik

Pas
942, 21st Lun
Main Road, Buff
Casual
0 2nd Stage, Jalsa Yes Yes 4.1/5 775 Banashankari Mas
Banashankari, Dining
Papa
... Pane
Laj

Mom
2nd Floor, 80 Lun
Feet Road, Spice Casual Buff
1 Yes No 4.1/5 787 Banashankari
Near Big Elephant Dining Chocola
Bazaar, 6th ... Nirva
Thai

Churr
1112, Next to San
KIMS Medical Cafe, Cannello
2 Churro
College, 17th Yes No 3.8/5 918 Banashankari Casual Minestro
Cafe Dining Soup, H
Cross...
Cho

Addhuri
Quick Mas
Bites
Bhojana
Banashankar...

10, 3rd Floor,

Casual
Lakshmi Grand No No 3.8/5 166 Basavanagudi Panipu
4
Associates, Village Dining Gol Gap
Gandhi Baza...

2.3 handling the null or missing values

In [10]: df.dropna(inplace = True)

In [11]: df.isnull().sum()

Out[11]: address 0
name 0
online_order 0
book_table 0
rate 0
votes 0
location 0
rest_type 0
dish_liked 0
cuisines 0
approx_cost(for two people) 0
reviews_list 0
menu_item 0
listed_in(type) 0
listed_in(city) 0
dtype: int64

Now there is no null values

2.4 checking the duplicates & handling the duplicates values

In [12]: df.duplicated().sum()

Out[12]: 11

In [13]: df.drop_duplicates(inplace = True)

df.duplicated().sum()

Out[13]: 0

Now there are no duplicate values.

In [ ]:

2.5 Renaming the columns appropriately

In [14]: df = df.rename(columns = {'approx_cost(for two people)':'cost',

'listed_in(type)':'type', 'listed_in(city)': 'city'})

In [15]: df.head()

Out[15]: address name online_order book_table rate votes location rest_type dish_lik

Pas
942, 21st Lun
Main Road, Casual Buff
Jalsa Yes Yes 4.1/5 775 Banashankari
0 2nd Stage, Mas
Dining
Banashankari, Papa
... Pane
Laj

Churr
1112, Next to San Cafe, Cannello
KIMS Medical
2 Churro Yes No 3.8/5 918 Banashankari Casual Minestro
College, 17th
Cafe Dining Soup, H
Cross...
Cho

Addhuri
Quick Mas
Bites
Bhojana
Banashankar...

10, 3rd Floor,

Lakshmi Grand Casual Panipu
4 No No 3.8/5 166 Basavanagudi
Associates, Village Dining Gol Gap
Gandhi Baza...

Sucessfully rename the columns

2.6 cleaning the "cost" column

In [16]: df['cost'].unique()

Out[16]: array(['800', '300', '600', '700', '550', '500', '450', '650', '400',
'750', '200', '850', '1,200', '150', '350', '250', '1,500',
'1,300', '1,000', '100', '900', '1,100', '1,600', '950', '230',
'1,700', '1,400', '1,350', '2,200', '2,000', '1,800', '1,900',
'180', '330', '2,500', '2,100', '3,000', '2,800', '3,400', '40',
'1,250', '3,500', '4,000', '2,400', '1,450', '3,200', '6,000',
'1,050', '4,100', '2,300', '120', '2,600', '5,000', '3,700',
'1,650', '2,700', '4,500'], dtype=object)

here we can see that data point is string type and some values like 5,000 6,000 have comma(,).
we have to remove that ',' from the values and we have convert them into numeric type.

In [17]: df['cost'] = df['cost'].apply(lambda x:x.replace(',', '')) # lo

df['cost'] = df['cost'].astype(float)

df['cost'].unique()

Out[17]: array([ 800., 300., 600., 700., 550., 500., 450., 650., 400.,
750., 200., 850., 1200., 150., 350., 250., 1500., 1300.,
1000., 100., 900., 1100., 1600., 950., 230., 1700., 1400.,
1350., 2200., 2000., 1800., 1900., 180., 330., 2500., 2100.,
3000., 2800., 3400., 40., 1250., 3500., 4000., 2400., 1450.,
3200., 6000., 1050., 4100., 2300., 120., 2600., 5000., 3700.,
1650., 2700., 4500.])

Now sucessfully we converted the values into numeric type

2.7 handling the rate columns

In [18]: df['rate'].unique()

Out[18]: array(['4.1/5', '3.8/5', '3.7/5', '4.6/5', '4.0/5', '4.2/5', '3.9/5',

'3.0/5', '3.6/5', '2.8/5', '4.4/5', '3.1/5', '4.3/5', '2.6/5',
'3.3/5', '3.5/5', '3.8 /5', '3.2/5', '4.5/5', '2.5/5', '2.9/5',
'3.4/5', '2.7/5', '4.7/5', 'NEW', '2.4/5', '2.2/5', '2.3/5',
'4.8/5', '3.9 /5', '4.2 /5', '4.0 /5', '4.1 /5', '2.9 /5',
'2.7 /5', '2.5 /5', '2.6 /5', '4.5 /5', '4.3 /5', '3.7 /5',
'4.4 /5', '4.9/5', '2.1/5', '2.0/5', '1.8/5', '3.4 /5', '3.6 /5',
'3.3 /5', '4.6 /5', '4.9 /5', '3.2 /5', '3.0 /5', '2.8 /5',
'3.5 /5', '3.1 /5', '4.8 /5', '2.3 /5', '4.7 /5', '2.4 /5',
'2.1 /5', '2.2 /5', '2.0 /5', '1.8 /5'], dtype=object)

here rating column also string type. we have to convert them into numeric type. we have to
remove the '/5' form given values.

there is 'NEW' value which make no sense. SO we have to remove that values.

In [19]: df = df.loc[df.rate != 'NEW'] # geting rid of 'NEW'

In [20]: df['rate'].unique()
Out[20]: array(['4.1/5', '3.8/5', '3.7/5', '4.6/5', '4.0/5', '4.2/5', '3.9/5',
'3.0/5', '3.6/5', '2.8/5', '4.4/5', '3.1/5', '4.3/5', '2.6/5',
'3.3/5', '3.5/5', '3.8 /5', '3.2/5', '4.5/5', '2.5/5', '2.9/5',
'3.4/5', '2.7/5', '4.7/5', '2.4/5', '2.2/5', '2.3/5', '4.8/5',
'3.9 /5', '4.2 /5', '4.0 /5', '4.1 /5', '2.9 /5', '2.7 /5',
'2.5 /5', '2.6 /5', '4.5 /5', '4.3 /5', '3.7 /5', '4.4 /5',
'4.9/5', '2.1/5', '2.0/5', '1.8/5', '3.4 /5', '3.6 /5', '3.3 /5',
'4.6 /5', '4.9 /5', '3.2 /5', '3.0 /5', '2.8 /5', '3.5 /5',
'3.1 /5', '4.8 /5', '2.3 /5', '4.7 /5', '2.4 /5', '2.1 /5',
'2.2 /5', '2.0 /5', '1.8 /5'], dtype=object)

In [21]: df['rate'] = df['rate'].apply(lambda x:x.replace('/5', ''))

df['rate'].unique()

Out[21]: array(['4.1', '3.8', '3.7', '4.6', '4.0', '4.2', '3.9', '3.0', '3.6',
'2.8', '4.4', '3.1', '4.3', '2.6', '3.3', '3.5', '3.8 ', '3.2',
'4.5', '2.5', '2.9', '3.4', '2.7', '4.7', '2.4', '2.2', '2.3',
'4.8', '3.9 ', '4.2 ', '4.0 ', '4.1 ', '2.9 ', '2.7 ', '2.5 ',
'2.6 ', '4.5 ', '4.3 ', '3.7 ', '4.4 ', '4.9', '2.1', '2.0', '1.8',
'3.4 ', '3.6 ', '3.3 ', '4.6 ', '4.9 ', '3.2 ', '3.0 ', '2.8 ',
'3.5 ', '3.1 ', '4.8 ', '2.3 ', '4.7 ', '2.4 ', '2.1 ', '2.2 ',
'2.0 ', '1.8 '], dtype=object)

In [22]: df['rate'] = df['rate'].apply(lambda x: float(x))

df['rate']

Out[22]: 0 4.1
1 4.1
2 3.8
3 3.7
4 3.8
...
51705 3.8
51707 3.9
51708 2.8
51711 2.5
51715 4.3
Name: rate, Length: 23248, dtype: float64

Now our data is cleaned and we can perform visulization

3. Data Visulaization
3.1 Most famous restaurant chains in banaglore

In [23]: plt.figure(figsize = (17,10))

chains = df['name'].value_counts()[:20]
sns.barplot(x = chains, y= chains.index, palette= 'deep')
plt.title('Most famous restaurants chains in bangalore')
plt.xlabel('Number of outlets')
plt.show()
Insights:

'Onesta', 'Empire Restaurant' & 'KFC' are the most famous restaurant in bangalore.

In [ ]:

3.2 checking online order or not

In [24]: v = df['online_order'].value_counts()
fig = plt.gcf()
fig.set_size_inches((10,6))
cmap = plt.get_cmap('Set3')
color = cmap(np.arange(len(v)))

plt.pie(v, labels = v.index, wedgeprops= dict(width = 0.6),autopct = '%0.02f', shadow = Tru

plt.title('Online orders', fontsize = 20)
plt.show()

Insight:
Most Restaurants offer option for online order and delivery.

3.3 Book table or not

In [25]: v = df['book_table'].value_counts()

fig = plt.gcf()
fig.set_size_inches((8,6))
cmap = plt.get_cmap('Set1')
color = cmap(np.arange(len(v)))

plt.pie(v, labels = v.index, wedgeprops= dict(width = 0.6),autopct = '%0.02f', shadow = Tru

plt.title('Book Table', fontsize = 20)
plt.show()

Insight:

Most of restaurants doesn't offer table booking.

3.4 Rating Distribution

In [26]: plt.figure(figsize = (9,7))

sns.distplot(df['rate'])
plt.title('Rating Distribution')

Out[26]: Text(0.5, 1.0, 'Rating Distribution')

Insight:

We can infer from above that most of the ratings are within 3.5 and 4.5

Zomato EDA
No ratings yet
Zomato EDA
8 pages
Zomoto Data Analysis Using Python - 1
No ratings yet
Zomoto Data Analysis Using Python - 1
10 pages
EDA Zomato 1681401606
No ratings yet
EDA Zomato 1681401606
15 pages
Zomoto Data Analysis Using Python
No ratings yet
Zomoto Data Analysis Using Python
10 pages
Zomato Dataset Analysis 1742311371
No ratings yet
Zomato Dataset Analysis 1742311371
16 pages
Predicting Home Prices in Bangalore
No ratings yet
Predicting Home Prices in Bangalore
18 pages
Foodhub Project Full Code .HTML
89% (9)
Foodhub Project Full Code .HTML
30 pages
Project Report1
No ratings yet
Project Report1
9 pages
Predicting Airbnb Property Pricing
No ratings yet
Predicting Airbnb Property Pricing
11 pages
DA - Project 1
No ratings yet
DA - Project 1
12 pages
F 10
No ratings yet
F 10
3 pages
Finalproj Aml
No ratings yet
Finalproj Aml
69 pages
DIY Activity - Gourmet Data Refinery-5593
No ratings yet
DIY Activity - Gourmet Data Refinery-5593
3 pages
Airbnb Pricing Model Analysis
No ratings yet
Airbnb Pricing Model Analysis
8 pages
PYF Project LearnerNotebook LowCode
No ratings yet
PYF Project LearnerNotebook LowCode
6 pages
Zomato Sales Analysis
No ratings yet
Zomato Sales Analysis
13 pages
Zomato Bangalore Data Analysis Insights
No ratings yet
Zomato Bangalore Data Analysis Insights
11 pages
Data Analysis Exercises for Beginners
No ratings yet
Data Analysis Exercises for Beginners
43 pages
Bangalore Zomato Restaurant Analysis
No ratings yet
Bangalore Zomato Restaurant Analysis
33 pages
Data Exploration Summary
No ratings yet
Data Exploration Summary
3 pages
House Price Prediction Analysis
No ratings yet
House Price Prediction Analysis
14 pages
Customer Segmentation Analysis
No ratings yet
Customer Segmentation Analysis
34 pages
BigMart Sales Data Analysis
No ratings yet
BigMart Sales Data Analysis
16 pages
Task 1 - Data Preparation and Customer Analytics - Jupyter Notebook
No ratings yet
Task 1 - Data Preparation and Customer Analytics - Jupyter Notebook
64 pages
Boston Housing Solutions
No ratings yet
Boston Housing Solutions
3 pages
Restaurant Rating Prediction Study
No ratings yet
Restaurant Rating Prediction Study
4 pages
Zomato Rating Prediction Project
No ratings yet
Zomato Rating Prediction Project
9 pages
Project Template Notebook Ipynb 1
No ratings yet
Project Template Notebook Ipynb 1
23 pages
Session 4 - Activity
No ratings yet
Session 4 - Activity
3 pages
Bangalore Restaurant Insights
No ratings yet
Bangalore Restaurant Insights
3 pages
Outlier Detection with PySpark
No ratings yet
Outlier Detection with PySpark
1 page
00 Data Wrangling
No ratings yet
00 Data Wrangling
10 pages
Documentation Final
No ratings yet
Documentation Final
53 pages
Supervised Regression
No ratings yet
Supervised Regression
24 pages
Predicting Indian Railways Ticket Prices
No ratings yet
Predicting Indian Railways Ticket Prices
20 pages
Real Estate Price Prediction Model
No ratings yet
Real Estate Price Prediction Model
33 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
BigMart PDF
100% (1)
BigMart PDF
42 pages
Project 2
No ratings yet
Project 2
40 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
18 pages
Ex 1
No ratings yet
Ex 1
119 pages
Data Analysis with Pandas Guide
No ratings yet
Data Analysis with Pandas Guide
40 pages
Uber Fare Prediction Analysis
No ratings yet
Uber Fare Prediction Analysis
6 pages
Quantam - Learning - Colaboratory
No ratings yet
Quantam - Learning - Colaboratory
13 pages
Food Recommendation System
No ratings yet
Food Recommendation System
13 pages
Data Clearning
No ratings yet
Data Clearning
7 pages
Housing Data Cleaning & Analysis
No ratings yet
Housing Data Cleaning & Analysis
7 pages
F 5
No ratings yet
F 5
2 pages
Should: Action
No ratings yet
Should: Action
12 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
Pyspark Interview Questions
No ratings yet
Pyspark Interview Questions
4 pages
F 12
No ratings yet
F 12
3 pages
Restaurant Rating Prediction Using ML
No ratings yet
Restaurant Rating Prediction Using ML
4 pages
Mongodb Assignment11
No ratings yet
Mongodb Assignment11
3 pages
Colab
No ratings yet
Colab
2 pages
Report
No ratings yet
Report
25 pages
Pandas Lec 2
No ratings yet
Pandas Lec 2
21 pages
Adobe Scan Feb 16, 2025
No ratings yet
Adobe Scan Feb 16, 2025
8 pages
Section 13 Homomorphisms: Homomorphism
No ratings yet
Section 13 Homomorphisms: Homomorphism
8 pages
Abstract Algebra for College Students
No ratings yet
Abstract Algebra for College Students
5 pages
Ring Theory PDF
No ratings yet
Ring Theory PDF
8 pages
Model Paper 6.1
No ratings yet
Model Paper 6.1
2 pages
Jean-Pierre Serre
No ratings yet
Jean-Pierre Serre
12 pages
Algebraic Topology A Structural Introduction 1st Edition Marco Grandis
No ratings yet
Algebraic Topology A Structural Introduction 1st Edition Marco Grandis
84 pages
Vector Space
No ratings yet
Vector Space
3 pages
Homomorphism
No ratings yet
Homomorphism
10 pages
Basic Algebra PDF
No ratings yet
Basic Algebra PDF
12 pages
Python Exercises
No ratings yet
Python Exercises
1 page
Prof. Michael Murray - Some Differential Geometry Exercises
No ratings yet
Prof. Michael Murray - Some Differential Geometry Exercises
4 pages
Practical 7 - Sem4
No ratings yet
Practical 7 - Sem4
3 pages
Commutative Algebra Lecture Notes
No ratings yet
Commutative Algebra Lecture Notes
90 pages
Mathematical Physics Solutions
No ratings yet
Mathematical Physics Solutions
7 pages
Understanding Java Wrapper Classes
No ratings yet
Understanding Java Wrapper Classes
8 pages
(Mathematical World 4) v. v. Prasolov - Intuitive Topology. 4-Universities Press India (1998)
0% (1)
(Mathematical World 4) v. v. Prasolov - Intuitive Topology. 4-Universities Press India (1998)
106 pages
Intro to Topology for Math Students
No ratings yet
Intro to Topology for Math Students
3 pages
MSC 2 Sem Mathematics Advanced Abstract Algebra 2 4665 2020
No ratings yet
MSC 2 Sem Mathematics Advanced Abstract Algebra 2 4665 2020
4 pages
Abramson Restricted Combinatioons and Compositions
No ratings yet
Abramson Restricted Combinatioons and Compositions
14 pages
Advanced Metric Spaces Guide
No ratings yet
Advanced Metric Spaces Guide
5 pages
Math Gre Exercises: Christian Parkinson
No ratings yet
Math Gre Exercises: Christian Parkinson
13 pages
Modern Algebra Solution
No ratings yet
Modern Algebra Solution
10 pages
Principles of Programming Languages: UNIT II - Intro To Programming Concepts Lecture 7 - Data Types
No ratings yet
Principles of Programming Languages: UNIT II - Intro To Programming Concepts Lecture 7 - Data Types
92 pages
THE i-QUANTUM GROUP Uı (N)
No ratings yet
THE i-QUANTUM GROUP Uı (N)
44 pages
D. J. Benson Representations and Cohomology Volume 1, Basic Representation Theory of Finite Groups and Associative Algebras Cambridge Studies in Advanced Mathema
100% (1)
D. J. Benson Representations and Cohomology Volume 1, Basic Representation Theory of Finite Groups and Associative Algebras Cambridge Studies in Advanced Mathema
258 pages
Doctoral Thesis Proposal: Math
No ratings yet
Doctoral Thesis Proposal: Math
15 pages
Handwritten Measure Theory Notes
No ratings yet
Handwritten Measure Theory Notes
171 pages
2050A Solution 3
No ratings yet
2050A Solution 3
2 pages
Geometric Sequences Explained
No ratings yet
Geometric Sequences Explained
6 pages

Zomato Rating Prediction

Uploaded by

Zomato Rating Prediction

Uploaded by

NAME : Kanade Shubhada Sanjay

1. Importing the libraires

1.1 Loading the dataset

In [2]: data = pd.read_csv('../input/zomato-bangalore-restaurants/zomato.csv')

942, 21st Main

10, 3rd Floor,

... ... ... ... ...

51717 rows × 17 columns

1.2 checking the shape of dataset

Out[4]: (51717, 17)

there are total 51717 samples with 17 features.

Out[5]: Index(['url', 'address', 'name', 'online_order', 'book_table', 'rate', 'votes',

1.3 checking the datatypes

0 url 51717 non-null object

2.2 Removing the unnecessary columns form data

In [8]: df = data.drop(['url', 'phone'], axis = 1) # dropped 'url' and 'phone' columns

10, 3rd Floor,

2.3 handling the null or missing values

In [10]: df.dropna(inplace = True)

Now there is no null values

In [13]: df.drop_duplicates(inplace = True)

Now there are no duplicate values.

2.5 Renaming the columns appropriately

In [14]: df = df.rename(columns = {'approx_cost(for two people)':'cost',

10, 3rd Floor,

Sucessfully rename the columns

In [17]: df['cost'] = df['cost'].apply(lambda x:x.replace(',', '')) # lo

Now sucessfully we converted the values into numeric type

2.7 handling the rate columns

Out[18]: array(['4.1/5', '3.8/5', '3.7/5', '4.6/5', '4.0/5', '4.2/5', '3.9/5',

In [19]: df = df.loc[df.rate != 'NEW'] # geting rid of 'NEW'

In [21]: df['rate'] = df['rate'].apply(lambda x:x.replace('/5', ''))

In [22]: df['rate'] = df['rate'].apply(lambda x: float(x))

Now our data is cleaned and we can perform visulization

In [23]: plt.figure(figsize = (17,10))

3.2 checking online order or not

plt.pie(v, labels = v.index, wedgeprops= dict(width = 0.6),autopct = '%0.02f', shadow = Tru

3.3 Book table or not

plt.pie(v, labels = v.index, wedgeprops= dict(width = 0.6),autopct = '%0.02f', shadow = Tru

Most of restaurants doesn't offer table booking.

3.4 Rating Distribution

In [26]: plt.figure(figsize = (9,7))

Out[26]: Text(0.5, 1.0, 'Rating Distribution')

You might also like