Python For Data Science - Project Support
[email protected]
EH4VJF9GMO
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
Getting Started with Data Analysis
Common Statistical Measures
Significance of Data Visualization
Agenda
[email protected]EH4VJF9GMO
Choosing plots for Univariate/Bivariate Analysis
Projects - Business Context & objective
Projects - Submission Guidelines
Projects - Q/A
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Let’s begin the discussion by answering a few questions on
[email protected] Python programming and Exploratory Data Analysis
EH4VJF9GMO
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
Which of the following represents the correct sequence of steps to begin data
analysis?
A Import Libraries => EDA => Load dataset
[email protected]EH4VJF9GMO
B Load dataset => Import Libraries => EDA
D Import Libraries => Load dataset => EDA
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
Which of the following represents the correct sequence of steps to begin data
analysis?
A Import Libraries => EDA => Load dataset
[email protected]EH4VJF9GMO
B Load dataset => Import Libraries => EDA
D Import Libraries => Load dataset => EDA
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Getting Started with Data Analysis
1 Importing Packages Loading the Dataset 2
Using pandas functions, we load the dataset in a
In this step, we import all the necessary packages dataframe. For csv files, ‘pd.read_csv( )’ is used.
such as numpy, pandas, matplotlib, seaborn etc. For excel files, ‘pd.read_excel( )’ is used.
[email protected]EH4VJF9GMO
3 Exploratory Data Analysis
In this step, we look for the shape of the dataset, the different data
types, check for anomalous and missing values, and analyse the
attributes individually as well as relationships between them through
visualizations to identify key business insights
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
Consider the file Project_order.csv stored in the following folder hierarchy
Python => Project => Dataset
Which of the following code snippets is the correct way to load the file into a
pandas dataframe in Google Colab?
[email protected]
EH4VJF9GMO A df = pd.read_csv("Python/Project/Dataset/Project_order.csv")
B df = pd.read_csv("Python\Project\Dataset\Project_order.csv")
C df = pd.read_csv("Python//Project//Dataset//Project_order.csv")
D df = pd.read_csv("Python\\Project\\Dataset\\Project_order.csv")
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
Consider the file Project_order.csv stored in the following folder hierarchy
Python => Project => Dataset
Which of the following code snippets is the correct way to load the file into a
pandas dataframe in Google Colab?
[email protected]
EH4VJF9GMO A df = pd.read_csv("Python/Project/Dataset/Project_order.csv")
B df = pd.read_csv("Python\Project\Dataset\Project_order.csv")
C df = pd.read_csv("Python//Project//Dataset//Project_order.csv")
D df = pd.read_csv("Python\\Project\\Dataset\\Project_order.csv")
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Loading Datasets in Pandas
read_csv - pandas function used to load datasets in CSV format into a pandas dataframe
Syntax: df = pd.read_csv(“file_path/file_name.csv”)
Pandas has to be imported with alias pd - import pandas as pd
[email protected]
EH4VJF9GMO
The file name has to be enclosed in quotation marks (single or double)
Above syntax works when the file (dataset) is in the same working directory as the Python
notebook
When the file (dataset) and the Python notebook are not in the same working directory,
the path to the file has to be specified
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
Which of the following measures condense the dataset down to one
representative central value?
A Mean, Median, Mode
[email protected]EH4VJF9GMO
B Standard Deviation, Variance, Range
C Correlation Coefficient
D Maximum, Median, Minimum
This file is meant for personal use by
[email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
Which of the following measures condense the dataset down to one
representative central value?
A Mean, Median, Mode
[email protected]EH4VJF9GMO
B Standard Deviation, Variance, Range
C Correlation Coefficient
D Maximum, Median, Minimum
This file is meant for personal use by
[email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Common Statistical Measures
Central tendency measures condense the dataset down to one representative central value
Allows us to compare one dataset to another
Median Mode
Mode
EH4VJF9GMO
The mean is the arithmetic average The median is the middle score in a The mode is the most frequent
of a set of given numbers. set of given numbers. score in a set of given numbers.
df[‘column_name’].mean( ) df[‘column_name’].median( ) df[‘column_name’].mode( )[0]
The mean can be used to Since the mean is highly affected Mode is the preferred measure
represent the typical value and by the outliers, the median is a
when data is categorical.
therefore serves as a yardstick for better choice for a dataset with
all observations. extreme values
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
Consider a dataframe df with two attributes "height" and "weight". Which of
the following methods can be used to check the correlation between these
two variables?
A df.corr()
[email protected]
EH4VJF9GMO
B sns.heatmap(df)
C sns.histplot(data=df, x='height')
D plt.scatter(df['height'], df['weight'])
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
Consider a dataframe df with two attributes "height" and "weight". Which of
the following methods can be used to check the correlation between these
two variables?
A df.corr()
[email protected]
EH4VJF9GMO
B sns.heatmap(df)
C sns.histplot(data=df, x='height')
D plt.scatter(df['height'], df['weight'])
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Common Statistical Measures
Correlation is a measure of association between two variables
Correlation coefficient is a statistical measure of the strength of the linear relationship
between two variables.
plt.scatter(df['height'],
[email protected]
df.corr() sns.heatmap(df)
df['weight'])
EH4VJF9GMO
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Common Statistical Measures
Based on direction of change in the value of one variable as the value of the other changes, the
two variables are said to have a positive relationship, negative relationship, or no relationship
at all.
[email protected]
EH4VJF9GMO
+ve correlation -ve correlation No correlation
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
Consider a dataframe df containing information about CustomerID, Region,
Purchase Amount. which of the following statements is true?
A df.info()provides information about data types of columns
[email protected]EH4VJF9GMO
df.value_counts('CustomerID') returns a single number
B
representing the total count of the values in the 'CustomerID' column
df.describe() returns the counts, mean, standard deviation, min, max,
C
and quartiles of numeric columns
df.groupby('Region')[‘Purchase Amount’].sum() provides the
D
total sum of amount of purchase by different regions
This file is meant for personal use by
[email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
Consider a dataframe df containing information about CustomerID, Region,
Purchase Amount. which of the following statements is true?
A df.info()provides information about data types of columns
[email protected]EH4VJF9GMO
df.value_counts('CustomerID') returns a single number
B
representing the total count of the values in the 'CustomerID' column
df.describe() returns the counts, mean, standard deviation, min, max,
C
and quartiles of numeric columns
df.groupby('Region')[‘Purchase Amount’].sum() provides the
D
total sum of amount of purchase by different regions
This file is meant for personal use by
[email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Pandas
One of the most commonly used Python libraries for data manipulation and analysis
df.head( ) df.shape df.astype( ) df.info( )
The df.astype( ) function The df.info() function
The df.head( ) function The df.shape returns the
[email protected] convert the data type of an returns information about
returns the first 5 rows of number of rows and
EH4VJF9GMO existing column in a the dataframe including the
the dataframe columns of the dataframe
dataframe data types of each column
and memory usage
df.describe( ) df.unique( ) df.groupby( ) df.value_counts( )
The df.describe() function The df.groupby( ) function
The df.unique() function The df.value_counts( )
returns the statistical info function is used to split the
returns the unique returns a Series containing
like percentile, mean, data into groups
values present in a the counts of unique values.
standard deviation, etc. of
dataframe
the dataframe
This file is meant for personal use by
[email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
According to the jointplot below, where is the highest density of data points?
A Total bill ~(15 – 20) & Tip ~(3 - 4)
[email protected]EH4VJF9GMO
B Total bill ~(5 - 10) & Tip ~(1.5 - 2.5)
C Total bill ~(10 - 20) & Tip ~(1.5 - 2.5)
D Total bill ~(25 - 35) & Tip ~(3 - 4)
This file is meant for personal use by
[email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
According to the jointplot below, where is the highest density of data points?
A Total bill ~(15 – 20) & Tip ~(3 - 4)
[email protected]EH4VJF9GMO
B Total bill ~(5 - 10) & Tip ~(1.5 - 2.5)
C Total bill ~(10 - 20) & Tip ~(1.5 - 2.5)
D Total bill ~(25 - 35) & Tip ~(3 - 4)
This file is meant for personal use by
[email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Significance of Data Visualization
Gives us a better idea of the information stored in data by giving it visual context through
various plots
Allows us to visualize large volumes of data in an understandable and coherent way
[email protected]
EH4VJF9GMO
Also enables us to identify relationships and patterns within data
Helps us comprehend the information and draw conclusions and insights
Enables data storytelling to easily create a narrative through graphics and diagrams
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
Which of the following the combination of plot and type of data is generally
used for univariate analysis?
A Boxplot - Numerical Data
[email protected]EH4VJF9GMO
B Histogram - Numerical data
C Lineplot - Categorical Data
D Countplot - Categorical Data
This file is meant for personal use by
[email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz
Which of the following the combination of plot and type of data is generally
used for univariate analysis?
A Boxplot - Numerical Data
[email protected]EH4VJF9GMO
B Histogram - Numerical data
C Lineplot - Categorical Data
D Countplot - Categorical Data
This file is meant for personal use by
[email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Choosing plots for Univariate Analysis
When to use a Histogram
When the data is numeric and you want to see the shape of the data
distribution, determine whether the data is distributed approximately
normally (bell shaped) or not
sns.histplot( data = , x = ‘ ‘, kde = True )
[email protected]EH4VJF9GMO
When to use Boxplot
When the data is numeric and you want to understand the centre,
spread, and presence of outliers
sns.boxplot( data = , x = ‘ ‘)
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Choosing plots for Univariate Analysis
When to use a Count plot
When the data is categorical and you want to show the counts of
observations in each categorical bin
sns.countplot( data = , x = ‘ ‘)
[email protected]
EH4VJF9GMO
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Choosing plots for Bivariate Analysis
When to use a scatter plot
When the data is numeric and you want to determine whether the two
variables are related, and see if it's a positive or negative correlation.
sns.scatterplot( data = , x = ‘ ‘, y = ‘ ‘ )
[email protected]
EH4VJF9GMO
When to use a line chart
When the data is continuous and you want to see the how the value
of something changes over short and long periods of time.
sns.lineplot( data = , x = ‘ ‘, y = ‘ ‘ )
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Graded Project - Coded vs Guided
There are two ways to work on the project for the course
Coded way: Write the solution code from scratch, create a business report based on the
output from the code, and submit both the Python notebook and the business report.
[email protected]
Guided way: Use an
EH4VJF9GMO existing template notebook to build the solution, create a business report
based on the output from the code, and submit only the business report
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Graded Project - Coded vs Guided
The guided project is designed to assist learners who are beginners in coding and guide them
in their projects.
Involves a deduction of 15%, i.e., even if all the requirements of the project are completed
optimally, the maximum grade that can be obtained would be 85%
[email protected]
EH4VJF9GMO
This deduction is levied owing to the amount of effort required to complete the guided way
compared to the coded way of attempting the project
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coded Project - Business Context and Objective
Analysts are required to explore data and reflect on the insights. Clear writing skill is an
integral part of a good report. Note that the explanations must be such that readers with
minimum knowledge of analytics is able to grasp the insight.
[email protected]
Austo Motor Company
EH4VJF9GMO is a leading car manufacturer specializing in SUV, Sedan, and
Hatchback models. In its recent board meeting, concerns were raised by the members on the
efficiency of the marketing campaign currently being used. The board decides to rope in an
analytics professional to improve the existing campaign.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coded Project - Business Context and Objective
They want to analyze the data to get a fair idea about the demand of customers which will
help them in enhancing their customer experience. Suppose you are a Data Scientist at the
company and the Data Science team has shared some of the key questions that need to be
answered. Perform the data analysis to find answers to these questions that will help the
company to improve the business.
[email protected]
EH4VJF9GMO
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coded Project - Data Dictionary
Data Data Description
Age The age of the individual in years
Gender The gender of the individual, categorized as male or female
Profession
[email protected] The occupation or profession of the individual
EH4VJF9GMO
Marital_status The marital status of the individual, such as married &, single
Education The educational qualification of the individual Graduate and Post Graduate
No_of_Dependen The number of dependents (e.g., children, elderly parents) that the individual
ts supports financially.
Personal_loan A binary variable indicating whether the individual has taken a personal loan
"Yes" or "No"
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coded Project - Data Dictionary
Data Data Description
House_loan A binary variable indicating whether the individual has taken a housing loan
"Yes" or "No"
Partner_working A binary variable indicating whether the individual's partner is employed "Yes"
[email protected] or "No"
EH4VJF9GMO
Salary The individual's salary or income
Partner_salary The salary or income of the individual's partner, if applicable
Total_salary The total combined salary of the individual and their partner (if applicable)
Price The price of a product or service
Make The type of automobile
This file is meant for personal use by
[email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Guided Project - Business Context and Objective
The number of restaurants in New York is increasing day by day. Lots of students and busy
professionals rely on those restaurants due to their hectic lifestyles. Online food delivery
service is a great option for them. It provides them with good food from their favorite
restaurants. A food aggregator company FoodHub offers access to multiple restaurants
through a single smartphone app.
[email protected]
EH4VJF9GMO
The app allows restaurants to receive a direct online order from a customer. The app assigns
a delivery person from the company to pick up the order after it is confirmed by the
restaurant. The delivery person then uses the map to reach the restaurant and waits for the
food package. Once the food package is handed over to the delivery person, he/she confirms
the pick-up in the app and travels to the customer's location to deliver the food. The delivery
person confirms the drop-off in the app after delivering the food package to the customer.
The customer can rate the order in the app. The food aggregator earns money by collecting a
fixed margin of the delivery order from the restaurants.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Guided Project - Business Context and Objective
The food aggregator company has stored the data of the different orders made by the
registered customers in their online portal. They want to analyze the data to get a fair idea
about the demand of different restaurants which will help them in enhancing their customer
experience. Suppose you are hired as a Data Scientist in this company and the Data Science
team has shared some of the key questions that need to be answered. Perform the data
analysis to find answers to these questions that will help the company to improve the
business.
[email protected]
EH4VJF9GMO
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Guided Project - Data Dictionary
Data Data Description
order_id Unique ID of the order
customer_id ID of the customer who ordered the food
restaurant_name
[email protected] Name of the restaurant
EH4VJF9GMO
cuisine_type Cuisine ordered by the customer
cost Cost of the order
day_of_the_week Indicates whether the order is placed on a weekday or weekend (The weekday
is from Monday to Friday and the weekend is Saturday and Sunday)
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Guided Project - Data Dictionary
Data Data Description
rating Rating given by the customer out of 5
food_preparation Time (in minutes) taken by the restaurant to prepare the food. This is calculated
_time by taking the difference between the timestamps of the restaurant's order
[email protected] confirmation and the delivery person's pick-up confirmation.
EH4VJF9GMO
delivery_time Time (in minutes) taken by the delivery person to deliver the food package. This
is calculated by taking the difference between the timestamps of the delivery
person's pick-up confirmation and drop-off information
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Google Colab
Step 1: Upload the csv file in the Google Drive
Step 2: Create a new notebook / open an existing notebook
Step 3: Import pandas library into the notebook. The following code can be used for the same
[email protected]
EH4VJF9GMO
import pandas as pd
Step 4: Mount Google Drive in the notebook. This can be done via two approaches:
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Google Colab
Approach 1
Step i: Click on the Files option on the left
Step ii: Select the Mount Drive option
[email protected]EH4VJF9GMO
Step iii: In the pop-up that appears, select Connect to Google Drive option
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Google Colab
[email protected]
EH4VJF9GMO
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Google Colab
Approach 2
Step i: Run the following command in the notebook
from google.colab import drive
drive.mount('/content/drive')
[email protected]EH4VJF9GMO
Step ii: In the pop-up that appears, select Connect to Google Drive option
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Google Colab
Step 5: Expand the Drive option, and browse to your working directory
Step 6: Right-click on the file and select Copy path
For example, if we want to load the file Project.csv, which is present in the Colab Notebooks
folder in MyDrive, we would navigate to the folder and right-click on the file to get the file path
[email protected]
EH4VJF9GMO
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Google Colab
Step 7: Create a variable path and set the copied file path as the value of the variable (you can
simply paste the copied file path for this)
Step 8: Pass the path variable as an argument of the pandas read_csv() function to load the file
into a pandas dataframe and store it in a variable
[email protected]
EH4VJF9GMO
For example: df = pd.read_csv(path)
Step 9: Call the head() function of the dataframe to check if the data is imported correctly
For example: df.head()
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Google Colab
[email protected]
EH4VJF9GMO
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Jupyter Notebook
Step 1: Download the CSV file you want to work with
Step 2: Locate the file in the Local Drive
Step 3: Right-click on the file and click on Properties and copy the file location
[email protected]
EH4VJF9GMO
Step 4: Import numpy and pandas
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Jupyter Notebook
Step 5: Paste the path in the variable path and add the filename at the end, as shown below
It is important to replace the single backslash (i.e., \) in the file path with a double
backslash (i.e., \\), a single forward slash (i.e., /), or a double forward slash (i.e., //).
For example: if thefilename is Project.csv and the file path is C:\Users\User\Downloads ,
[email protected]EH4VJF9GMO
then the path variable should be defined as one of the following:
path = 'C:\\Users\\User\\Downloads\\Project.csv'
path = 'C:/Users/User/Downloads/Project.csv'
path = 'C://Users//User//Downloads//Project.csv'
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Jupyter Notebook
[email protected]
EH4VJF9GMO
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Jupyter Notebook
[email protected]
EH4VJF9GMO
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Jupyter Notebook
[email protected]
EH4VJF9GMO
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Jupyter Notebook
Step 6: Call the path variable in the read_csv() function of pandas to load the file into a pandas
dataframe, and store it in a variable
For example: df = pd.read_csv(path)
Step 7: Call the head() function of the dataframe to check if the data is imported correctly
[email protected]
EH4VJF9GMO
For example: df.head()
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coded Project - Submission Guidelines
There are two parts to the submission for the project:
1) Business report [.pdf]: This file will be the primary criterion for evaluation
2) Supporting file [.ipynb]: This file will be used to validate the content of the business report
[email protected]
EH4VJF9GMO
Please note that in case the business report is not submitted, the assessment will be graded
zero
Submitting the supporting file is mandatory and in case it is not submitted, the assessment will
be graded zero
As the business report is the primary criterion for evaluation, kindly make sure that all the
required information asked in the rubric is included in the business report
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Guided Project - Submission Guidelines
Download the dataset and the Template Notebook
Fill in the blanks in the notebook to complete and execute the code to solve the questions and
perform all the tasks as per the grading rubric
[email protected]
Once the notebook
EH4VJF9GMO is completely executed and necessary outputs obtained, a business report
has to be created
Only the business report should be submitted as a PDF file (.pdf)
Kindly make sure that all the required information asked in the rubric is included in the business
report
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Project - Q/A
[email protected]
EH4VJF9GMO
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 53
Happy Learning !
[email protected]EH4VJF9GMO
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action. 54
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.