0% found this document useful (0 votes)
24 views54 pages

MLS 5 - Python Project Support

The document outlines a training module on Python programming and exploratory data analysis (EDA), including quizzes and key concepts such as data loading, statistical measures, and data visualization. It emphasizes the importance of data visualization for understanding and interpreting data, as well as the proper sequence for data analysis. The content is proprietary and intended for personal use only, with legal restrictions on sharing.

Uploaded by

Jitendra Asati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views54 pages

MLS 5 - Python Project Support

The document outlines a training module on Python programming and exploratory data analysis (EDA), including quizzes and key concepts such as data loading, statistical measures, and data visualization. It emphasizes the importance of data visualization for understanding and interpreting data, as well as the proper sequence for data analysis. The content is proprietary and intended for personal use only, with legal restrictions on sharing.

Uploaded by

Jitendra Asati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Python For Data Science - Project Support

[email protected]
EH4VJF9GMO

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

Getting Started with Data Analysis

Common Statistical Measures

Significance of Data Visualization


Agenda
[email protected]
EH4VJF9GMO
Choosing plots for Univariate/Bivariate Analysis

Projects - Business Context & objective

Projects - Submission Guidelines

Projects - Q/A

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Let’s begin the discussion by answering a few questions on
[email protected]
Python programming and Exploratory Data Analysis
EH4VJF9GMO

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

Which of the following represents the correct sequence of steps to begin data
analysis?

A Import Libraries => EDA => Load dataset


[email protected]
EH4VJF9GMO

B Load dataset => Import Libraries => EDA

D Import Libraries => Load dataset => EDA

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

Which of the following represents the correct sequence of steps to begin data
analysis?

A Import Libraries => EDA => Load dataset


[email protected]
EH4VJF9GMO

B Load dataset => Import Libraries => EDA

D Import Libraries => Load dataset => EDA

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Getting Started with Data Analysis

1 Importing Packages Loading the Dataset 2


Using pandas functions, we load the dataset in a
In this step, we import all the necessary packages dataframe. For csv files, ‘pd.read_csv( )’ is used.
such as numpy, pandas, matplotlib, seaborn etc. For excel files, ‘pd.read_excel( )’ is used.
[email protected]
EH4VJF9GMO

3 Exploratory Data Analysis

In this step, we look for the shape of the dataset, the different data
types, check for anomalous and missing values, and analyse the
attributes individually as well as relationships between them through
visualizations to identify key business insights

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

Consider the file Project_order.csv stored in the following folder hierarchy


Python => Project => Dataset
Which of the following code snippets is the correct way to load the file into a
pandas dataframe in Google Colab?

[email protected]
EH4VJF9GMO A df = pd.read_csv("Python/Project/Dataset/Project_order.csv")

B df = pd.read_csv("Python\Project\Dataset\Project_order.csv")

C df = pd.read_csv("Python//Project//Dataset//Project_order.csv")

D df = pd.read_csv("Python\\Project\\Dataset\\Project_order.csv")
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

Consider the file Project_order.csv stored in the following folder hierarchy


Python => Project => Dataset
Which of the following code snippets is the correct way to load the file into a
pandas dataframe in Google Colab?

[email protected]
EH4VJF9GMO A df = pd.read_csv("Python/Project/Dataset/Project_order.csv")

B df = pd.read_csv("Python\Project\Dataset\Project_order.csv")

C df = pd.read_csv("Python//Project//Dataset//Project_order.csv")

D df = pd.read_csv("Python\\Project\\Dataset\\Project_order.csv")
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Loading Datasets in Pandas
read_csv - pandas function used to load datasets in CSV format into a pandas dataframe

Syntax: df = pd.read_csv(“file_path/file_name.csv”)

Pandas has to be imported with alias pd - import pandas as pd

[email protected]
EH4VJF9GMO
The file name has to be enclosed in quotation marks (single or double)

Above syntax works when the file (dataset) is in the same working directory as the Python
notebook

When the file (dataset) and the Python notebook are not in the same working directory,
the path to the file has to be specified

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

Which of the following measures condense the dataset down to one


representative central value?

A Mean, Median, Mode


[email protected]
EH4VJF9GMO

B Standard Deviation, Variance, Range

C Correlation Coefficient

D Maximum, Median, Minimum


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

Which of the following measures condense the dataset down to one


representative central value?

A Mean, Median, Mode


[email protected]
EH4VJF9GMO

B Standard Deviation, Variance, Range

C Correlation Coefficient

D Maximum, Median, Minimum


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Common Statistical Measures

Central tendency measures condense the dataset down to one representative central value

Allows us to compare one dataset to another

[email protected] MeanMean Median


Median Mode
Mode
EH4VJF9GMO
The mean is the arithmetic average The median is the middle score in a The mode is the most frequent
of a set of given numbers. set of given numbers. score in a set of given numbers.

df[‘column_name’].mean( ) df[‘column_name’].median( ) df[‘column_name’].mode( )[0]

The mean can be used to Since the mean is highly affected Mode is the preferred measure
represent the typical value and by the outliers, the median is a
when data is categorical.
therefore serves as a yardstick for better choice for a dataset with
all observations. extreme values

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

Consider a dataframe df with two attributes "height" and "weight". Which of


the following methods can be used to check the correlation between these
two variables?

A df.corr()
[email protected]
EH4VJF9GMO

B sns.heatmap(df)

C sns.histplot(data=df, x='height')

D plt.scatter(df['height'], df['weight'])
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

Consider a dataframe df with two attributes "height" and "weight". Which of


the following methods can be used to check the correlation between these
two variables?

A df.corr()
[email protected]
EH4VJF9GMO

B sns.heatmap(df)

C sns.histplot(data=df, x='height')

D plt.scatter(df['height'], df['weight'])
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Common Statistical Measures
Correlation is a measure of association between two variables

Correlation coefficient is a statistical measure of the strength of the linear relationship


between two variables.

plt.scatter(df['height'],
[email protected]
df.corr() sns.heatmap(df)
df['weight'])
EH4VJF9GMO

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Common Statistical Measures
Based on direction of change in the value of one variable as the value of the other changes, the
two variables are said to have a positive relationship, negative relationship, or no relationship
at all.

[email protected]
EH4VJF9GMO

+ve correlation -ve correlation No correlation

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

Consider a dataframe df containing information about CustomerID, Region,


Purchase Amount. which of the following statements is true?

A df.info()provides information about data types of columns


[email protected]
EH4VJF9GMO
df.value_counts('CustomerID') returns a single number
B
representing the total count of the values in the 'CustomerID' column

df.describe() returns the counts, mean, standard deviation, min, max,


C
and quartiles of numeric columns

df.groupby('Region')[‘Purchase Amount’].sum() provides the


D
total sum of amount of purchase by different regions
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

Consider a dataframe df containing information about CustomerID, Region,


Purchase Amount. which of the following statements is true?

A df.info()provides information about data types of columns


[email protected]
EH4VJF9GMO
df.value_counts('CustomerID') returns a single number
B
representing the total count of the values in the 'CustomerID' column

df.describe() returns the counts, mean, standard deviation, min, max,


C
and quartiles of numeric columns

df.groupby('Region')[‘Purchase Amount’].sum() provides the


D
total sum of amount of purchase by different regions
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Pandas

One of the most commonly used Python libraries for data manipulation and analysis

df.head( ) df.shape df.astype( ) df.info( )

The df.astype( ) function The df.info() function


The df.head( ) function The df.shape returns the
[email protected] convert the data type of an returns information about
returns the first 5 rows of number of rows and
EH4VJF9GMO existing column in a the dataframe including the
the dataframe columns of the dataframe
dataframe data types of each column
and memory usage

df.describe( ) df.unique( ) df.groupby( ) df.value_counts( )

The df.describe() function The df.groupby( ) function


The df.unique() function The df.value_counts( )
returns the statistical info function is used to split the
returns the unique returns a Series containing
like percentile, mean, data into groups
values present in a the counts of unique values.
standard deviation, etc. of
dataframe
the dataframe
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

According to the jointplot below, where is the highest density of data points?

A Total bill ~(15 – 20) & Tip ~(3 - 4)


[email protected]
EH4VJF9GMO

B Total bill ~(5 - 10) & Tip ~(1.5 - 2.5)

C Total bill ~(10 - 20) & Tip ~(1.5 - 2.5)

D Total bill ~(25 - 35) & Tip ~(3 - 4)


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

According to the jointplot below, where is the highest density of data points?

A Total bill ~(15 – 20) & Tip ~(3 - 4)


[email protected]
EH4VJF9GMO

B Total bill ~(5 - 10) & Tip ~(1.5 - 2.5)

C Total bill ~(10 - 20) & Tip ~(1.5 - 2.5)

D Total bill ~(25 - 35) & Tip ~(3 - 4)


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Significance of Data Visualization

Gives us a better idea of the information stored in data by giving it visual context through
various plots

Allows us to visualize large volumes of data in an understandable and coherent way

[email protected]
EH4VJF9GMO
Also enables us to identify relationships and patterns within data

Helps us comprehend the information and draw conclusions and insights

Enables data storytelling to easily create a narrative through graphics and diagrams

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

Which of the following the combination of plot and type of data is generally
used for univariate analysis?

A Boxplot - Numerical Data


[email protected]
EH4VJF9GMO

B Histogram - Numerical data

C Lineplot - Categorical Data

D Countplot - Categorical Data


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Python Programming & EDA Quiz

Which of the following the combination of plot and type of data is generally
used for univariate analysis?

A Boxplot - Numerical Data


[email protected]
EH4VJF9GMO

B Histogram - Numerical data

C Lineplot - Categorical Data

D Countplot - Categorical Data


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Choosing plots for Univariate Analysis
When to use a Histogram

When the data is numeric and you want to see the shape of the data
distribution, determine whether the data is distributed approximately
normally (bell shaped) or not

sns.histplot( data = , x = ‘ ‘, kde = True )


[email protected]
EH4VJF9GMO

When to use Boxplot

When the data is numeric and you want to understand the centre,
spread, and presence of outliers

sns.boxplot( data = , x = ‘ ‘)

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Choosing plots for Univariate Analysis
When to use a Count plot

When the data is categorical and you want to show the counts of
observations in each categorical bin

sns.countplot( data = , x = ‘ ‘)

[email protected]
EH4VJF9GMO

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Choosing plots for Bivariate Analysis
When to use a scatter plot

When the data is numeric and you want to determine whether the two
variables are related, and see if it's a positive or negative correlation.

sns.scatterplot( data = , x = ‘ ‘, y = ‘ ‘ )

[email protected]
EH4VJF9GMO

When to use a line chart

When the data is continuous and you want to see the how the value
of something changes over short and long periods of time.

sns.lineplot( data = , x = ‘ ‘, y = ‘ ‘ )

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Graded Project - Coded vs Guided
There are two ways to work on the project for the course

Coded way: Write the solution code from scratch, create a business report based on the
output from the code, and submit both the Python notebook and the business report.

[email protected]
Guided way: Use an
EH4VJF9GMO existing template notebook to build the solution, create a business report
based on the output from the code, and submit only the business report

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Graded Project - Coded vs Guided

The guided project is designed to assist learners who are beginners in coding and guide them
in their projects.

Involves a deduction of 15%, i.e., even if all the requirements of the project are completed
optimally, the maximum grade that can be obtained would be 85%
[email protected]
EH4VJF9GMO

This deduction is levied owing to the amount of effort required to complete the guided way
compared to the coded way of attempting the project

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coded Project - Business Context and Objective

Analysts are required to explore data and reflect on the insights. Clear writing skill is an
integral part of a good report. Note that the explanations must be such that readers with
minimum knowledge of analytics is able to grasp the insight.

[email protected]
Austo Motor Company
EH4VJF9GMO is a leading car manufacturer specializing in SUV, Sedan, and
Hatchback models. In its recent board meeting, concerns were raised by the members on the
efficiency of the marketing campaign currently being used. The board decides to rope in an
analytics professional to improve the existing campaign.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coded Project - Business Context and Objective

They want to analyze the data to get a fair idea about the demand of customers which will
help them in enhancing their customer experience. Suppose you are a Data Scientist at the
company and the Data Science team has shared some of the key questions that need to be
answered. Perform the data analysis to find answers to these questions that will help the
company to improve the business.
[email protected]
EH4VJF9GMO

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coded Project - Data Dictionary

Data Data Description

Age The age of the individual in years

Gender The gender of the individual, categorized as male or female

Profession
[email protected] The occupation or profession of the individual
EH4VJF9GMO
Marital_status The marital status of the individual, such as married &, single

Education The educational qualification of the individual Graduate and Post Graduate

No_of_Dependen The number of dependents (e.g., children, elderly parents) that the individual
ts supports financially.

Personal_loan A binary variable indicating whether the individual has taken a personal loan
"Yes" or "No"
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coded Project - Data Dictionary

Data Data Description

House_loan A binary variable indicating whether the individual has taken a housing loan
"Yes" or "No"

Partner_working A binary variable indicating whether the individual's partner is employed "Yes"
[email protected] or "No"
EH4VJF9GMO
Salary The individual's salary or income

Partner_salary The salary or income of the individual's partner, if applicable

Total_salary The total combined salary of the individual and their partner (if applicable)

Price The price of a product or service

Make The type of automobile


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Guided Project - Business Context and Objective

The number of restaurants in New York is increasing day by day. Lots of students and busy
professionals rely on those restaurants due to their hectic lifestyles. Online food delivery
service is a great option for them. It provides them with good food from their favorite
restaurants. A food aggregator company FoodHub offers access to multiple restaurants
through a single smartphone app.

[email protected]
EH4VJF9GMO
The app allows restaurants to receive a direct online order from a customer. The app assigns
a delivery person from the company to pick up the order after it is confirmed by the
restaurant. The delivery person then uses the map to reach the restaurant and waits for the
food package. Once the food package is handed over to the delivery person, he/she confirms
the pick-up in the app and travels to the customer's location to deliver the food. The delivery
person confirms the drop-off in the app after delivering the food package to the customer.
The customer can rate the order in the app. The food aggregator earns money by collecting a
fixed margin of the delivery order from the restaurants.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Guided Project - Business Context and Objective

The food aggregator company has stored the data of the different orders made by the
registered customers in their online portal. They want to analyze the data to get a fair idea
about the demand of different restaurants which will help them in enhancing their customer
experience. Suppose you are hired as a Data Scientist in this company and the Data Science
team has shared some of the key questions that need to be answered. Perform the data
analysis to find answers to these questions that will help the company to improve the
business.
[email protected]
EH4VJF9GMO

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Guided Project - Data Dictionary

Data Data Description

order_id Unique ID of the order

customer_id ID of the customer who ordered the food

restaurant_name
[email protected] Name of the restaurant
EH4VJF9GMO
cuisine_type Cuisine ordered by the customer

cost Cost of the order

day_of_the_week Indicates whether the order is placed on a weekday or weekend (The weekday
is from Monday to Friday and the weekend is Saturday and Sunday)

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Guided Project - Data Dictionary

Data Data Description

rating Rating given by the customer out of 5

food_preparation Time (in minutes) taken by the restaurant to prepare the food. This is calculated
_time by taking the difference between the timestamps of the restaurant's order
[email protected] confirmation and the delivery person's pick-up confirmation.
EH4VJF9GMO

delivery_time Time (in minutes) taken by the delivery person to deliver the food package. This
is calculated by taking the difference between the timestamps of the delivery
person's pick-up confirmation and drop-off information

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Google Colab

Step 1: Upload the csv file in the Google Drive

Step 2: Create a new notebook / open an existing notebook

Step 3: Import pandas library into the notebook. The following code can be used for the same
[email protected]
EH4VJF9GMO
import pandas as pd

Step 4: Mount Google Drive in the notebook. This can be done via two approaches:

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Google Colab

Approach 1

Step i: Click on the Files option on the left

Step ii: Select the Mount Drive option


[email protected]
EH4VJF9GMO
Step iii: In the pop-up that appears, select Connect to Google Drive option

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Google Colab

[email protected]
EH4VJF9GMO

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Google Colab

Approach 2

Step i: Run the following command in the notebook

from google.colab import drive


drive.mount('/content/drive')
[email protected]
EH4VJF9GMO

Step ii: In the pop-up that appears, select Connect to Google Drive option

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Google Colab

Step 5: Expand the Drive option, and browse to your working directory

Step 6: Right-click on the file and select Copy path

For example, if we want to load the file Project.csv, which is present in the Colab Notebooks
folder in MyDrive, we would navigate to the folder and right-click on the file to get the file path
[email protected]
EH4VJF9GMO

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Google Colab

Step 7: Create a variable path and set the copied file path as the value of the variable (you can
simply paste the copied file path for this)

Step 8: Pass the path variable as an argument of the pandas read_csv() function to load the file
into a pandas dataframe and store it in a variable
[email protected]
EH4VJF9GMO
For example: df = pd.read_csv(path)

Step 9: Call the head() function of the dataframe to check if the data is imported correctly

For example: df.head()

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Google Colab

[email protected]
EH4VJF9GMO

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Jupyter Notebook

Step 1: Download the CSV file you want to work with

Step 2: Locate the file in the Local Drive

Step 3: Right-click on the file and click on Properties and copy the file location
[email protected]
EH4VJF9GMO
Step 4: Import numpy and pandas

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Jupyter Notebook

Step 5: Paste the path in the variable path and add the filename at the end, as shown below

It is important to replace the single backslash (i.e., \) in the file path with a double
backslash (i.e., \\), a single forward slash (i.e., /), or a double forward slash (i.e., //).

For example: if thefilename is Project.csv and the file path is C:\Users\User\Downloads ,


[email protected]
EH4VJF9GMO
then the path variable should be defined as one of the following:

path = 'C:\\Users\\User\\Downloads\\Project.csv'

path = 'C:/Users/User/Downloads/Project.csv'

path = 'C://Users//User//Downloads//Project.csv'

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Jupyter Notebook

[email protected]
EH4VJF9GMO

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Jupyter Notebook

[email protected]
EH4VJF9GMO

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Jupyter Notebook

[email protected]
EH4VJF9GMO

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How to Load Dataset in Jupyter Notebook
Step 6: Call the path variable in the read_csv() function of pandas to load the file into a pandas
dataframe, and store it in a variable

For example: df = pd.read_csv(path)

Step 7: Call the head() function of the dataframe to check if the data is imported correctly
[email protected]
EH4VJF9GMO

For example: df.head()

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Coded Project - Submission Guidelines

There are two parts to the submission for the project:

1) Business report [.pdf]: This file will be the primary criterion for evaluation

2) Supporting file [.ipynb]: This file will be used to validate the content of the business report
[email protected]
EH4VJF9GMO
Please note that in case the business report is not submitted, the assessment will be graded
zero

Submitting the supporting file is mandatory and in case it is not submitted, the assessment will
be graded zero

As the business report is the primary criterion for evaluation, kindly make sure that all the
required information asked in the rubric is included in the business report
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Guided Project - Submission Guidelines

Download the dataset and the Template Notebook

Fill in the blanks in the notebook to complete and execute the code to solve the questions and
perform all the tasks as per the grading rubric

[email protected]
Once the notebook
EH4VJF9GMO is completely executed and necessary outputs obtained, a business report
has to be created

Only the business report should be submitted as a PDF file (.pdf)

Kindly make sure that all the required information asked in the rubric is included in the business
report

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Project - Q/A

[email protected]
EH4VJF9GMO

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 53
Happy Learning !
[email protected]
EH4VJF9GMO

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action. 54
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

You might also like