import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
dataframe = pd.read_csv("Zomato data .csv")
print(dataframe)
name online_order book_table rate votes \
0 Jalsa Yes Yes 4.1/5 775
1 Spice Elephant Yes No 4.1/5 787
2 San Churro Cafe Yes No 3.8/5 918
3 Addhuri Udupi Bhojana No No 3.7/5 88
4 Grand Village No No 3.8/5 166
.. ... ... ... ... ...
143 Melting Melodies No No 3.3/5 0
144 New Indraprasta No No 3.3/5 0
145 Anna Kuteera Yes No 4.0/5 771
146 Darbar No No 3.0/5 98
147 Vijayalakshmi Yes No 3.9/5 47
approx_cost(for two people) listed_in(type)
0 800 Buffet
1 800 Buffet
2 800 Buffet
3 300 Buffet
4 600 Buffet
.. ... ...
143 100 Dining
144 150 Dining
145 450 Dining
146 800 Dining
147 200 Dining
[148 rows x 7 columns]
Now we will read this data into our jupyter notebook
dataframe
name online_order book_table rate votes \
0 Jalsa Yes Yes 4.1/5 775
1 Spice Elephant Yes No 4.1/5 787
2 San Churro Cafe Yes No 3.8/5 918
3 Addhuri Udupi Bhojana No No 3.7/5 88
4 Grand Village No No 3.8/5 166
.. ... ... ... ... ...
143 Melting Melodies No No 3.3/5 0
144 New Indraprasta No No 3.3/5 0
145 Anna Kuteera Yes No 4.0/5 771
146 Darbar No No 3.0/5 98
147 Vijayalakshmi Yes No 3.9/5 47
approx_cost(for two people) listed_in(type)
0 800 Buffet
1 800 Buffet
2 800 Buffet
3 300 Buffet
4 600 Buffet
.. ... ...
143 100 Dining
144 150 Dining
145 450 Dining
146 800 Dining
147 200 Dining
[148 rows x 7 columns]
Now we will be working on this data
1st of all if we see the dataset there is a problem , like i want to remove
this /5 from the rating column . .. else everthing is fine
Convert the Data-type of Column-Rate
def handleRate(value):
value=str(value).split('/')
value=value[0];
return float(value)
dataframe['rate']=dataframe['rate'].apply(handleRate)
print(dataframe.head())
name online_order book_table rate votes \
0 Jalsa Yes Yes 4.1 775
1 Spice Elephant Yes No 4.1 787
2 San Churro Cafe Yes No 3.8 918
3 Addhuri Udupi Bhojana No No 3.7 88
4 Grand Village No No 3.8 166
approx_cost(for two people) listed_in(type)
0 800 Buffet
1 800 Buffet
2 800 Buffet
3 300 Buffet
4 600 Buffet
Sabse Pahle hamne Ek User-Defined function Banaya Jiska naam hamne rakha
"handleRate" jiske andar hamne ek value ko pass kiya . "Str" hamne kyu use kiya
kyuki ye jo datatype diya hua hain hame wo String format me hain . Now Split
function, like earlier it was written like 4.1/5 to agar aap chahte ho ki 5 yha se cut jaye
to Ye kaaam split function se hoga
Now , we Need only 4.1 so , Value= Value[0], i.e, on 0th Position 4.1 is available, uske
baad return kra diya isss value ko.
#### Now ab dekhoo hamare passs yha pe likha hain (dataframe waaala column) -->
dataframe ke andar hamare pass rate column hain jisko change krni hai iske liye
hamne apply krdiya newly made function i.e., handleRate
dataframe
name online_order book_table rate votes \
0 Jalsa Yes Yes 4.1 775
1 Spice Elephant Yes No 4.1 787
2 San Churro Cafe Yes No 3.8 918
3 Addhuri Udupi Bhojana No No 3.7 88
4 Grand Village No No 3.8 166
.. ... ... ... ... ...
143 Melting Melodies No No 3.3 0
144 New Indraprasta No No 3.3 0
145 Anna Kuteera Yes No 4.0 771
146 Darbar No No 3.0 98
147 Vijayalakshmi Yes No 3.9 47
approx_cost(for two people) listed_in(type)
0 800 Buffet
1 800 Buffet
2 800 Buffet
3 300 Buffet
4 600 Buffet
.. ... ...
143 100 Dining
144 150 Dining
145 450 Dining
146 800 Dining
147 200 Dining
[148 rows x 7 columns]
Now ham ek baar aur check krenge ki kahi koi value missing to nahi hain ? kya koi
value null to nahi hain ?
dataframe.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 148 entries, 0 to 147
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 148 non-null object
1 online_order 148 non-null object
2 book_table 148 non-null object
3 rate 148 non-null float64
4 votes 148 non-null int64
5 approx_cost(for two people) 148 non-null int64
6 listed_in(type) 148 non-null object
dtypes: float64(1), int64(2), object(4)
memory usage: 8.2+ KB
saari values thik hain yha pe info function information ke liye tha
Now 1st Question
Q1. What Type Of Restaurant do the majority of Customers order from
?
Basically wo type ka restaurant ko find karna hain jisse majority customer khaaana Order krte
hain
Type of Restaurant
Note : we have to show this using Bar graph
dataframe.head()
name online_order book_table rate votes \
0 Jalsa Yes Yes 4.1 775
1 Spice Elephant Yes No 4.1 787
2 San Churro Cafe Yes No 3.8 918
3 Addhuri Udupi Bhojana No No 3.7 88
4 Grand Village No No 3.8 166
approx_cost(for two people) listed_in(type)
0 800 Buffet
1 800 Buffet
2 800 Buffet
3 300 Buffet
4 600 Buffet
jab ham head lagate hain to starting ke 5 data dikhayega
Now i want to make bar graph of it , for making this we will be using seaborn library
sns.countplot(x=dataframe['listed_in(type)'])
plt.xlabel("type of restaurant")
Text(0.5, 0, 'type of restaurant')
now i'm going to explain this two lines of code
what is countplot ?
A count plot is a type of visualization that displays the number of observations in each category
of a categorical variable.
sns.countplot(x=dataframe['listed_in(type)'])
Show the counts of observations in each categorical bin using bars. to jab hame aaisa plot
banana hota hian jha hame exact value ko count krke likhna hota hain , wha par ham countplot
ko use krenge
x=dataframe['listed_in(type)'])
x axis pe hame chahiye tha --> konse type ka restaurant hain ( listed in type)
plt.xlabel("type of restaurant")
yha par ham X axis ko label dene ke liye isse use krenge
Conclusion ---> Majority of the restaurant falls in Dinning Category
### Q2. How many votes has each type of restaurant received from customers ?
basically the question is -> kitne votes kiss- kiss type ke restaurents ko mile
dataframe.head()
name online_order book_table rate votes \
0 Jalsa Yes Yes 4.1 775
1 Spice Elephant Yes No 4.1 787
2 San Churro Cafe Yes No 3.8 918
3 Addhuri Udupi Bhojana No No 3.7 88
4 Grand Village No No 3.8 166
approx_cost(for two people) listed_in(type)
0 800 Buffet
1 800 Buffet
2 800 Buffet
3 300 Buffet
4 600 Buffet
grouped_data = dataframe.groupby('listed_in(type)')['votes'].sum()
result = pd.DataFrame({'votes': grouped_data})
plt.plot(result, c="green", marker="o")
plt.xlabel("Type of restaurant", c="red", size=20)
plt.ylabel("votes", c="red", size=20)
Text(0, 0.5, 'votes')
now explaining this code
1st) I created a variable by the name of grouped data
2nd) Ab iss dataframe me 2 column se hame matlab tha listed_intype and votes , hamne indono
ko group me krdiya aur sum krdiya
dataframe.groupby('listed_in(type)')['votes'].sum
3rd) Ab ye jo grouped data hamne banaya hain isko pass krdiya result me
The pandas DataFrame(pd.DataFrame) is a structure that contains two-dimensional data and its
corresponding labels
4th) Marker dotted chahiye tha isliye "o" pass kra , and green is for line color
5th) In x-axis i want ki labeling aaye aur likha ho type of restaurant
6th) Similarly in Y-axis votes chahiye labeling me
Conclusion --> Dinning Restaurants has received maximum votes
this is how we can find insights from datas , so that companies can make stratigies
Q3. What are the ratings that the majority of restaurants have received
?
dataframe.head()
name online_order book_table rate votes \
0 Jalsa Yes Yes 4.1 775
1 Spice Elephant Yes No 4.1 787
2 San Churro Cafe Yes No 3.8 918
3 Addhuri Udupi Bhojana No No 3.7 88
4 Grand Village No No 3.8 166
approx_cost(for two people) listed_in(type)
0 800 Buffet
1 800 Buffet
2 800 Buffet
3 300 Buffet
4 600 Buffet
plt.hist(dataframe['rate'],bins=5)
plt.title("ratings distribution")
plt.show()
Code Explanation
for histogram --> hist , dataframe ke andar (rate) ko daal diya because we have to work on this
column
bin=5 ( bar area show )
Conclusion
the majority restaurants received ratings from 3.5-4
Q4. Zomato has Observed that most couples order most of their food
Online.
What is their average spending on each order ?
Average Order Spending on Food BY Couples
dataframe.head()
name online_order book_table rate votes \
0 Jalsa Yes Yes 4.1 775
1 Spice Elephant Yes No 4.1 787
2 San Churro Cafe Yes No 3.8 918
3 Addhuri Udupi Bhojana No No 3.7 88
4 Grand Village No No 3.8 166
approx_cost(for two people) listed_in(type)
0 800 Buffet
1 800 Buffet
2 800 Buffet
3 300 Buffet
4 600 Buffet
couple_data=dataframe['approx_cost(for two people)']
sns.countplot(x=couple_data)
<Axes: xlabel='approx_cost(for two people)', ylabel='count'>
About Code
1) A variable is created with the name couple_data
2) dataframe likha then column name pass kiya approx_cost waala
3) hamne ek countplot bnaya
4) X axis me hamne pass kiya couple_data
Conclusion
The majority of Couples prefer restaurants with an approximate cost of --> 300rs.
ab company kya kregi 300rs se related hi items show krengi unke account me isse jyada ka nhi so
that ki sell acchi ho ... ( take an exapmle of Iphone budget ad)
Q5. Which mode ( Online or Offline) has received the maximum
rating ?
dataframe.head()
name online_order book_table rate votes \
0 Jalsa Yes Yes 4.1 775
1 Spice Elephant Yes No 4.1 787
2 San Churro Cafe Yes No 3.8 918
3 Addhuri Udupi Bhojana No No 3.7 88
4 Grand Village No No 3.8 166
approx_cost(for two people) listed_in(type)
0 800 Buffet
1 800 Buffet
2 800 Buffet
3 300 Buffet
4 600 Buffet
plt.figure(figsize = (6,6))
sns.boxplot(x = 'online_order', y= 'rate', data = dataframe)
<Axes: xlabel='online_order', ylabel='rate'>
code
1) figure plot kiya, size de di
2) boxplot load kiya and x & y axis pe data ko load kiya from dataframe
conclusion
clearly yes online order is maximum and offline order receives lower rating
incomparison with online mode
Q5. Which type of restaurant received more offline orders, so that
Zomato can provide those customers with some good offers?
dataframe.head()
name online_order book_table rate votes \
0 Jalsa Yes Yes 4.1 775
1 Spice Elephant Yes No 4.1 787
2 San Churro Cafe Yes No 3.8 918
3 Addhuri Udupi Bhojana No No 3.7 88
4 Grand Village No No 3.8 166
approx_cost(for two people) listed_in(type)
0 800 Buffet
1 800 Buffet
2 800 Buffet
3 300 Buffet
4 600 Buffet
pivot_table = dataframe.pivot_table(index='listed_in(type)',
columns='online_order', aggfunc='size', fill_value=0)
sns.heatmap(pivot_table, annot=True, cmap="YlGnBu",fmt='d')
plt.title("Heatmap")
plt.xlabel("Online Order")
plt.ylabel("Listed In (Type)")
plt.show()
Code Explain
1) Pivot table hamne create kara hain kyuki hame table hi banana hain iddhar aur variable ka
naam pivot_table rakha hai
2) columns required for this question -> listed type and online order