0% found this document useful (0 votes)

4 views21 pages

Data Analysis With Python - Prof. Pinaki Das

The document outlines an Executive Training Program on Data Analysis with Python, led by Dr. Pinaki Das. It covers basic concepts of data analysis, the advantages of using Python and Jupyter Notebooks, and hands-on training with libraries such as Pandas, NumPy, and Matplotlib. Participants will learn to perform data analysis, including summary statistics, correlation, regression, and data visualization.

Uploaded by

satesic802

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views21 pages

Data Analysis With Python - Prof. Pinaki Das

Uploaded by

satesic802

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Data Analysis with Python

Executive Training Program on Python

[September 19, 2025]

Dr. Pinaki Das

Professor
Department of Economics
Vidyasagar University

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
Outline of the Presentation
Part I: Basic Concepts
• What is Data Analysis?
• Why Python for Data Analysis?
• Why Jupyter Notebook?
• Three Core Libraries: Pandas, NumPy, Matplotlib
• Understanding Summary Statistics
• Relation and Prediction
Part II: Hands-on Data Analysis with Python
• Install and Set Up Jupyter Notebook
• Import Libraries (Pandas, NumPy, Matplotlib)
• Load Data from Excel File
• Explore Data – Summary Statistics, Skewness, Kurtosis
• Correlation and Regression
• Visualize Data with Matplotlib
Dr. Pinaki Das Vidyasagar
Dept. of Economics University
Part I: Basic Concept

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
1. What is Data Analysis?
Definition
•The systematic process of cleaning, transforming, and interpreting data
•Goal: extract meaningful insights for decision-making
Why It Matters
•Converts raw numbers into knowledge
•Helps in identifying trends, patterns, and relationships
•Supports evidence-based decisions in research, business, and policy
Simple Example
•Raw Data (Excel table): Height & Weight of students; Child Mortality (CM)
and its Co-factors, etc.
•Analysis (Python): Find summary statistics, correlations, create graphs
•Insight: Taller students tend to weigh more; CM is high for higher FTR

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
2. Why Python for Data Analysis?
Ease of Use
•Simple, readable syntax → beginner-friendly
•Works seamlessly with Excel, CSV, SQL
Powerful Libraries
•NumPy → fast numerical operations
•Pandas → data handling & analysis
•Matplotlib → visualization
Scalability & Flexibility
•Handles small to very large datasets
•Extensible for machine learning, AI, and big data
Community Support
•Large global community
•Abundant tutorials, examples, and resources
Dr. Pinaki Das Vidyasagar
Dept. of Economics University
3. Why Jupyter Notebooks

Familiarize Yourself with Jupyter Notebooks

Jupyter Notebooks are an excellent tool for data analysis because they allow you
to combine executable Python code with Markdown notes in a single document.
This makes your work more readable and easier to share with others.

● Start by creating a new notebook: Launch Jupyter Notebook from Anaconda

Navigator or the command line, and create a new notebook.

● Learn the basics: Get comfortable with the interface, learn how to add and
delete cells, and understand the difference between code cells and Markdown
cells.

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
4. Python Libraries for Data Analysis
Now, we're ready to dive into the libraries that will be our bread and butter for data
analysis: Pandas, NumPy, and Matplotlib.
Pandas
● Learn to import data, clean data, manipulate dataframes, and perform basic data
analysis tasks.
NumPy
● Get familiar with NumPy arrays and operations, which are foundational for
numerical computing in Python.
Matplotlib
● Learn to create basic plots like line graphs, scatter plots, and histograms to
visualize your data.
Additional libraries :
4. scikit-learn (sklearn) – for regression and machine learning models.
5. statsmodels – for detailed regression analysis and model summaries.

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
5. Summary Statistics: The First Step in Data Analysis

What are Summary Statistics?

•Numerical measures that summarize key features of data
•Provide a quick overview before detailed analysis
Common Measures
•Central Tendency: Mean, Median, Mode
•Dispersion: Range, Variance, Standard Deviation
•Shape: Skewness, Kurtosis
Why Important?
•Identify patterns and anomalies early
•Check data quality (outliers, missing values)
•Build foundation for further analysis (Correlation, Regression, visualization)

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
6. Data Analysis: Relation and Prediction
Correlation

Simple Regression Model

Multiple Regression Model

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
Part II: Hands-on Data Analysis with Python

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
1. Install and Import Pandas

Step 1: Install Pandas

First, ensure that Pandas is installed. Anaconda distribution comes with

Pandas, but if you need to install it manually, you can do so by running the
following command in your Jupyter notebook:

!pip install pandas

Step 2: Import Pandas

At the beginning of your notebook, import the Pandas library. It's common
practice to import Pandas with the alias pd:

import pandas as pd

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
2. Load Data and Verify
Step 3: Load Your Excel File
Use the read_excel() function from Pandas to load your Excel file. You'll need to
know the path to your file. If the file is in the same directory as your Jupyter
notebook, you only need to specify the filename. Otherwise, provide the full file
path.
df = pd.read_excel('your_file_name.xlsx')
"C:\Users\pdasv\OneDrive\Desktop\pinaki\Data_Analysis.xlsx"
If your Excel file has multiple sheets and you want to load a specific sheet, you
can specify the sheet name or its index (starting from 0) using the sheet_name
parameter:
df=pd.read_excel('your_file_name.xlsx', sheet_name= ‘Cor’)
Step 4: Verify the Data
After loading the data, it's a good practice to verify it by viewing the first few
rows. You can do this by using the head() method, which displays the first five
rows by default:
Dr. Pinaki Das df.head() Vidyasagar
Dept. of Economics University
3. Analysis of the data that contain Height and Weight
1. Summary Statistics
df.describe()

For Specific summary statistics say Median

and Mode
df['Height'].median()
df['Height'].mode()
OR
median_height = df['Height'].median()
print("Median of Height:", median_height)
# To get the first mode value
mode_height = df['Height'].mode()[0]
print("Mode of Height:", mode_height)

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
3. Relation between Heigh and Weigh data

2. Correlation
df.corr()
OR
correlation_matrix = df.corr()
print(correlation_matrix)

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
4. Prediction:
Step-by-step Simple Linear Regression (Weight=f(Height)
Step1. Import Libraries
First, ensure scikit-learn is installed or install it using pip, and then import the
necessary module:
!pip install scikit-learn
from sklearn.linear_model import LinearRegression
import numpy as np
Step2. Prepare Data
You often need to reshape your data to fit scikit-learn requirements, which expects 2D
arrays for the features (X values):
# Reshape data into a 2D array for scikit-learn
X = df['Height'].values.reshape(-1,1)
Y = df['Weight']

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
Regression …..
Step3. Create and Fit the Model
Instantiate the LinearRegression object, fit it to your data, and print the coefficients:
model = LinearRegression()
model.fit(X, Y)
print("Coefficient (Slope):", model.coef_[0])
print("Intercept:", model.intercept_)
Step 4. Predict and Evaluate
To predict and evaluate the model on the same data (for simplicity):
# Make predictions
Y_pred = model.predict(X)
# Calculating the R-squared value to assess the fit
from sklearn.metrics import r2_score
print("R-squared:", r2_score(Y, Y_pred))

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
4. Data Visualisation
5. Visualize the Regression Line
Using matplotlib to plot the data points and the regression line:
import matplotlib.pyplot as plt

# Plot the raw data

plt.scatter(X, Y, color='blue')
# Plot the regression line
plt.plot(X, Y_pred, color='red')
plt.title('Height vs Weight Regression')
plt.xlabel('Height')
plt.ylabel('Weight')
plt.show()

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
Alterative Method Regression with Heigh
Using statsmodels for Detailed Regression Analysis
and Weigh data
1. Install and Import statsmodels
If not already installed, you can install statsmodels
using pip. Then, import the necessary parts of the 3. Fit the Model
library: Create a model object using OLS
!pip install statsmodels (Ordinary Least Squares), fit it, and
import statsmodels.api as sm then print the summary:
2. Prepare Data # Create an OLS model
Just like with scikit-learn, you need to add a constant to model = sm.OLS(Y, X)
your predictor variable array to account for the
intercept in statsmodels: # Fit the model
# Predictor variable results = model.fit()
X = df['Height'] # Print the results summary
# Adds a constant term to the predictor, which is
print(results.summary())
required for the intercept
X = sm.add_constant(X)
# Response variable
Y = df['Weight']

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
5. Analysis of the data that contain CM, FLR…

Step 1: Load Data from a Specific Sheet

If you haven't already, make sure Pandas is installed, and then use it to load the
data from the specific sheet containing the variables CM, FLT, PCI, and TFR.
Suppose this sheet is named “CM":
import pandas as pd
# Load the data from a specific sheet
data = pd.read_excel('path_to_your_file.xlsx', sheet_name=‘CM')
Step 2: Verify the Data
It's a good practice to check the first few rows of the DataFrame to ensure that
the data has been loaded correctly:
data.head()
data.describe()

Dr. Pinaki Das Vidyasagar

Dept. of Economics University
5. Analysis of the data that contain CM, FLR…
Step 3. Correlation
Calculate the correlation between any two variables
(say CM and FLR)
data['CM'].corr(data['FLR'])
# Adds a constant term to the
OR predictors
correlation = data['CM'].corr(data['FLR']) X = sm.add_constant(X)
print("Correlation between CM and FLR:", correlation) # Dependent variable
Step 4: Perform Regression Analysis Y = data['CM']
If you need to perform regression analysis where CM # Fit the model
depends on FLR, PCI, and TFR, you can use the
model = sm.OLS(Y, X).fit()
statsmodels library as previously explained:
model.summary()
import statsmodels.api as sm
# Prepare data for regression
# Predictor variables
X = data[['FLR', 'PCI', 'TFR']]
Dr. Pinaki Das Vidyasagar
Dept. of Economics University
Thanks

Dr. Pinaki Das Vidyasagar

Dept. of Economics University

TYCS Practical
No ratings yet
TYCS Practical
26 pages
ML Combined
No ratings yet
ML Combined
254 pages
Data Analysis for Beginners
No ratings yet
Data Analysis for Beginners
8 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Unit 1,2
No ratings yet
Unit 1,2
17 pages
MLP Regressor with Sklearn on Wine Data
No ratings yet
MLP Regressor with Sklearn on Wine Data
10 pages
Statistics IMP Questions and Answers
No ratings yet
Statistics IMP Questions and Answers
23 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Module 2
No ratings yet
Module 2
14 pages
Data Science
No ratings yet
Data Science
15 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
ML with Python: Data Visualization Guide
No ratings yet
ML with Python: Data Visualization Guide
7 pages
PROJECTS
No ratings yet
PROJECTS
6 pages
CS3361 - Cropped 3 56
No ratings yet
CS3361 - Cropped 3 56
54 pages
Unit 2 Atds
No ratings yet
Unit 2 Atds
11 pages
Data Analysis & Visualization Guide
No ratings yet
Data Analysis & Visualization Guide
9 pages
Data Science Lab Manual: Pandas & Analysis
No ratings yet
Data Science Lab Manual: Pandas & Analysis
53 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Ad3411 - Dsa Lab Manual
No ratings yet
Ad3411 - Dsa Lab Manual
34 pages
Ch01 - Introduction To Data Science
No ratings yet
Ch01 - Introduction To Data Science
65 pages
Profitanalysis
No ratings yet
Profitanalysis
18 pages
CS3362 Data Science Lab Manual
67% (9)
CS3362 Data Science Lab Manual
53 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
DataScience Lab
No ratings yet
DataScience Lab
28 pages
Data Science Fundamentals
No ratings yet
Data Science Fundamentals
22 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
Data-Analytics-Manual Lab G.anill Kumar
No ratings yet
Data-Analytics-Manual Lab G.anill Kumar
23 pages
Ejemplos de Wooldridge en Codigo
No ratings yet
Ejemplos de Wooldridge en Codigo
428 pages
Using Python For Introductory Econometrics by Florian Heiss & Denial Brunner
No ratings yet
Using Python For Introductory Econometrics by Florian Heiss & Denial Brunner
432 pages
Programs MLT Lab Print
No ratings yet
Programs MLT Lab Print
72 pages
Statistical Analysis & Predictive Modeling
No ratings yet
Statistical Analysis & Predictive Modeling
4 pages
Python Simple Linear Regression Guide
No ratings yet
Python Simple Linear Regression Guide
14 pages
Lab Experiment 4 - AI
No ratings yet
Lab Experiment 4 - AI
7 pages
Python Data Science Cheat Sheet
0% (1)
Python Data Science Cheat Sheet
3 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Index: SR. NO. Practical Name Date of Perform NO. Sign
No ratings yet
Index: SR. NO. Practical Name Date of Perform NO. Sign
28 pages
Regression Analysis Cheat Sheet
No ratings yet
Regression Analysis Cheat Sheet
9 pages
Practical 1
No ratings yet
Practical 1
5 pages
Stats Unit1
No ratings yet
Stats Unit1
27 pages
Syllabus Analyzing, Visualizing, Data Science Minor
No ratings yet
Syllabus Analyzing, Visualizing, Data Science Minor
3 pages
Machine Learning - Multi Linear Regression Analysis
No ratings yet
Machine Learning - Multi Linear Regression Analysis
29 pages
Data Science (Journal)
No ratings yet
Data Science (Journal)
39 pages
Python For Data Science Quickstart Guide
No ratings yet
Python For Data Science Quickstart Guide
13 pages
Ad3411 - Data Science and Analytics Laboratory
No ratings yet
Ad3411 - Data Science and Analytics Laboratory
26 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
CS 3362 FDS
No ratings yet
CS 3362 FDS
53 pages
Nac PDF
No ratings yet
Nac PDF
23 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
Python Stripchart Example Guide
No ratings yet
Python Stripchart Example Guide
99 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
ML Recordjp
No ratings yet
ML Recordjp
35 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
31 pages
Statistical Analysis With Scipy?
No ratings yet
Statistical Analysis With Scipy?
9 pages
FDSA Lab Manual Aim Algorithm
No ratings yet
FDSA Lab Manual Aim Algorithm
32 pages
Python Data Science Cheat Sheet
100% (2)
Python Data Science Cheat Sheet
6 pages
The Abuses and Blackmail From My Father-In-Law
No ratings yet
The Abuses and Blackmail From My Father-In-Law
51 pages
Optimal Beam Cross Section via GA
No ratings yet
Optimal Beam Cross Section via GA
55 pages
BSM25 L 1
No ratings yet
BSM25 L 1
5 pages
Normalization in Databases
No ratings yet
Normalization in Databases
40 pages
Life Skills Respect Pic
No ratings yet
Life Skills Respect Pic
2 pages
Making Work Visible - Audio Book
No ratings yet
Making Work Visible - Audio Book
56 pages
Metalink
No ratings yet
Metalink
9 pages
Mark Scheme: Sample Assessment Material EG308 Engineering Level 3 Unit 8
No ratings yet
Mark Scheme: Sample Assessment Material EG308 Engineering Level 3 Unit 8
8 pages
MILAAP Investor User Manual
No ratings yet
MILAAP Investor User Manual
28 pages
Clinical Kinesiology and Anatomy 5th Edition by Lynn S Lippert PT MS Ebook and TestBank Bundle Unlocked Test Bank
No ratings yet
Clinical Kinesiology and Anatomy 5th Edition by Lynn S Lippert PT MS Ebook and TestBank Bundle Unlocked Test Bank
338 pages
Solomon 2ed Organic Chemistry PR
No ratings yet
Solomon 2ed Organic Chemistry PR
20 pages
Going Pro 3 Exam 2
No ratings yet
Going Pro 3 Exam 2
2 pages
Thesis Help for Struggling Students
100% (3)
Thesis Help for Struggling Students
7 pages
Logistics Syllabus
No ratings yet
Logistics Syllabus
3 pages
Objective Mapping and Kriging: 5.1 Contouring and Gridding Concepts
No ratings yet
Objective Mapping and Kriging: 5.1 Contouring and Gridding Concepts
24 pages
Lab Manuals for ICT Course at BUITEMS
No ratings yet
Lab Manuals for ICT Course at BUITEMS
6 pages
Agrasen Ki Baoli An Architectural Marvel
No ratings yet
Agrasen Ki Baoli An Architectural Marvel
8 pages
Duran Duran: A Critical Video Review
No ratings yet
Duran Duran: A Critical Video Review
3 pages
The Temple of Nim Newsletter - December 2009
No ratings yet
The Temple of Nim Newsletter - December 2009
18 pages
Work, Energy, and Power Concepts
No ratings yet
Work, Energy, and Power Concepts
21 pages
OB-Blanchard-Fields, F. (2007) - Everyday Problem Solving and Emotion. An Adult Developmental Perspective
100% (1)
OB-Blanchard-Fields, F. (2007) - Everyday Problem Solving and Emotion. An Adult Developmental Perspective
6 pages
Kinematics in One Dimension: Conceptual Questions
No ratings yet
Kinematics in One Dimension: Conceptual Questions
2 pages
Business Goals 1 (Student's Book)
100% (10)
Business Goals 1 (Student's Book)
126 pages
Physics Project Class 11
74% (61)
Physics Project Class 11
18 pages
Licad86061000006 L10
No ratings yet
Licad86061000006 L10
1 page
IA - Electrical Installation and Maintenance NC II 20151119
100% (1)
IA - Electrical Installation and Maintenance NC II 20151119
25 pages
Struktural BB6 Pelabuhan Ratu (Kam)
No ratings yet
Struktural BB6 Pelabuhan Ratu (Kam)
29 pages
How To Use Logical OR & AND in Shell Script With Examples - Unix
No ratings yet
How To Use Logical OR & AND in Shell Script With Examples - Unix
5 pages
EP Family Corp. v. Chen - Order On Default Judgment
No ratings yet
EP Family Corp. v. Chen - Order On Default Judgment
10 pages
Shipyard Layout Improvement Proposal
No ratings yet
Shipyard Layout Improvement Proposal
16 pages

Data Analysis With Python - Prof. Pinaki Das

Uploaded by

Data Analysis With Python - Prof. Pinaki Das

Uploaded by

Data Analysis with Python

Executive Training Program on Python

[September 19, 2025]

Dr. Pinaki Das

Dr. Pinaki Das Vidyasagar

Dr. Pinaki Das Vidyasagar

Dr. Pinaki Das Vidyasagar

Familiarize Yourself with Jupyter Notebooks

● Start by creating a new notebook: Launch Jupyter Notebook from Anaconda

Dr. Pinaki Das Vidyasagar

Dr. Pinaki Das Vidyasagar

What are Summary Statistics?

Dr. Pinaki Das Vidyasagar

Simple Regression Model

Multiple Regression Model

Dr. Pinaki Das Vidyasagar

Dr. Pinaki Das Vidyasagar

Step 1: Install Pandas

First, ensure that Pandas is installed. Anaconda distribution comes with

!pip install pandas

Step 2: Import Pandas

Dr. Pinaki Das Vidyasagar

For Specific summary statistics say Median

Dr. Pinaki Das Vidyasagar

Dr. Pinaki Das Vidyasagar

Dr. Pinaki Das Vidyasagar

Dr. Pinaki Das Vidyasagar

# Plot the raw data

Dr. Pinaki Das Vidyasagar

Dr. Pinaki Das Vidyasagar

Step 1: Load Data from a Specific Sheet

Dr. Pinaki Das Vidyasagar

Dr. Pinaki Das Vidyasagar

You might also like