0% found this document useful (0 votes)
4 views21 pages

Data Analysis With Python - Prof. Pinaki Das

The document outlines an Executive Training Program on Data Analysis with Python, led by Dr. Pinaki Das. It covers basic concepts of data analysis, the advantages of using Python and Jupyter Notebooks, and hands-on training with libraries such as Pandas, NumPy, and Matplotlib. Participants will learn to perform data analysis, including summary statistics, correlation, regression, and data visualization.

Uploaded by

satesic802
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views21 pages

Data Analysis With Python - Prof. Pinaki Das

The document outlines an Executive Training Program on Data Analysis with Python, led by Dr. Pinaki Das. It covers basic concepts of data analysis, the advantages of using Python and Jupyter Notebooks, and hands-on training with libraries such as Pandas, NumPy, and Matplotlib. Participants will learn to perform data analysis, including summary statistics, correlation, regression, and data visualization.

Uploaded by

satesic802
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Data Analysis with Python

Executive Training Program on Python

[September 19, 2025]

Dr. Pinaki Das


Professor
Department of Economics
Vidyasagar University

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
Outline of the Presentation
Part I: Basic Concepts
• What is Data Analysis?
• Why Python for Data Analysis?
• Why Jupyter Notebook?
• Three Core Libraries: Pandas, NumPy, Matplotlib
• Understanding Summary Statistics
• Relation and Prediction
Part II: Hands-on Data Analysis with Python
• Install and Set Up Jupyter Notebook
• Import Libraries (Pandas, NumPy, Matplotlib)
• Load Data from Excel File
• Explore Data – Summary Statistics, Skewness, Kurtosis
• Correlation and Regression
• Visualize Data with Matplotlib
Dr. Pinaki Das Vidyasagar
Dept. of Economics University
Part I: Basic Concept

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
1. What is Data Analysis?
Definition
•The systematic process of cleaning, transforming, and interpreting data
•Goal: extract meaningful insights for decision-making
Why It Matters
•Converts raw numbers into knowledge
•Helps in identifying trends, patterns, and relationships
•Supports evidence-based decisions in research, business, and policy
Simple Example
•Raw Data (Excel table): Height & Weight of students; Child Mortality (CM)
and its Co-factors, etc.
•Analysis (Python): Find summary statistics, correlations, create graphs
•Insight: Taller students tend to weigh more; CM is high for higher FTR

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
2. Why Python for Data Analysis?
Ease of Use
•Simple, readable syntax → beginner-friendly
•Works seamlessly with Excel, CSV, SQL
Powerful Libraries
•NumPy → fast numerical operations
•Pandas → data handling & analysis
•Matplotlib → visualization
Scalability & Flexibility
•Handles small to very large datasets
•Extensible for machine learning, AI, and big data
Community Support
•Large global community
•Abundant tutorials, examples, and resources
Dr. Pinaki Das Vidyasagar
Dept. of Economics University
3. Why Jupyter Notebooks

Familiarize Yourself with Jupyter Notebooks

Jupyter Notebooks are an excellent tool for data analysis because they allow you
to combine executable Python code with Markdown notes in a single document.
This makes your work more readable and easier to share with others.

● Start by creating a new notebook: Launch Jupyter Notebook from Anaconda


Navigator or the command line, and create a new notebook.

● Learn the basics: Get comfortable with the interface, learn how to add and
delete cells, and understand the difference between code cells and Markdown
cells.

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
4. Python Libraries for Data Analysis
Now, we're ready to dive into the libraries that will be our bread and butter for data
analysis: Pandas, NumPy, and Matplotlib.
Pandas
● Learn to import data, clean data, manipulate dataframes, and perform basic data
analysis tasks.
NumPy
● Get familiar with NumPy arrays and operations, which are foundational for
numerical computing in Python.
Matplotlib
● Learn to create basic plots like line graphs, scatter plots, and histograms to
visualize your data.
Additional libraries :
4. scikit-learn (sklearn) – for regression and machine learning models.
5. statsmodels – for detailed regression analysis and model summaries.

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
5. Summary Statistics: The First Step in Data Analysis

What are Summary Statistics?


•Numerical measures that summarize key features of data
•Provide a quick overview before detailed analysis
Common Measures
•Central Tendency: Mean, Median, Mode
•Dispersion: Range, Variance, Standard Deviation
•Shape: Skewness, Kurtosis
Why Important?
•Identify patterns and anomalies early
•Check data quality (outliers, missing values)
•Build foundation for further analysis (Correlation, Regression, visualization)

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
6. Data Analysis: Relation and Prediction
Correlation

Simple Regression Model

Multiple Regression Model

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
Part II: Hands-on Data Analysis with Python

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
1. Install and Import Pandas

Step 1: Install Pandas

First, ensure that Pandas is installed. Anaconda distribution comes with


Pandas, but if you need to install it manually, you can do so by running the
following command in your Jupyter notebook:

!pip install pandas

Step 2: Import Pandas

At the beginning of your notebook, import the Pandas library. It's common
practice to import Pandas with the alias pd:

import pandas as pd

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
2. Load Data and Verify
Step 3: Load Your Excel File
Use the read_excel() function from Pandas to load your Excel file. You'll need to
know the path to your file. If the file is in the same directory as your Jupyter
notebook, you only need to specify the filename. Otherwise, provide the full file
path.
df = pd.read_excel('your_file_name.xlsx')
"C:\Users\pdasv\OneDrive\Desktop\pinaki\Data_Analysis.xlsx"
If your Excel file has multiple sheets and you want to load a specific sheet, you
can specify the sheet name or its index (starting from 0) using the sheet_name
parameter:
df=pd.read_excel('your_file_name.xlsx', sheet_name= ‘Cor’)
Step 4: Verify the Data
After loading the data, it's a good practice to verify it by viewing the first few
rows. You can do this by using the head() method, which displays the first five
rows by default:
Dr. Pinaki Das df.head() Vidyasagar
Dept. of Economics University
3. Analysis of the data that contain Height and Weight
1. Summary Statistics
df.describe()

For Specific summary statistics say Median


and Mode
df['Height'].median()
df['Height'].mode()
OR
median_height = df['Height'].median()
print("Median of Height:", median_height)
# To get the first mode value
mode_height = df['Height'].mode()[0]
print("Mode of Height:", mode_height)

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
3. Relation between Heigh and Weigh data

2. Correlation
df.corr()
OR
correlation_matrix = df.corr()
print(correlation_matrix)

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
4. Prediction:
Step-by-step Simple Linear Regression (Weight=f(Height)
Step1. Import Libraries
First, ensure scikit-learn is installed or install it using pip, and then import the
necessary module:
!pip install scikit-learn
from sklearn.linear_model import LinearRegression
import numpy as np
Step2. Prepare Data
You often need to reshape your data to fit scikit-learn requirements, which expects 2D
arrays for the features (X values):
# Reshape data into a 2D array for scikit-learn
X = df['Height'].values.reshape(-1,1)
Y = df['Weight']

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
Regression …..
Step3. Create and Fit the Model
Instantiate the LinearRegression object, fit it to your data, and print the coefficients:
model = LinearRegression()
model.fit(X, Y)
print("Coefficient (Slope):", model.coef_[0])
print("Intercept:", model.intercept_)
Step 4. Predict and Evaluate
To predict and evaluate the model on the same data (for simplicity):
# Make predictions
Y_pred = model.predict(X)
# Calculating the R-squared value to assess the fit
from sklearn.metrics import r2_score
print("R-squared:", r2_score(Y, Y_pred))

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
4. Data Visualisation
5. Visualize the Regression Line
Using matplotlib to plot the data points and the regression line:
import matplotlib.pyplot as plt

# Plot the raw data


plt.scatter(X, Y, color='blue')
# Plot the regression line
plt.plot(X, Y_pred, color='red')
plt.title('Height vs Weight Regression')
plt.xlabel('Height')
plt.ylabel('Weight')
plt.show()

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
Alterative Method Regression with Heigh
Using statsmodels for Detailed Regression Analysis
and Weigh data
1. Install and Import statsmodels
If not already installed, you can install statsmodels
using pip. Then, import the necessary parts of the 3. Fit the Model
library: Create a model object using OLS
!pip install statsmodels (Ordinary Least Squares), fit it, and
import statsmodels.api as sm then print the summary:
2. Prepare Data # Create an OLS model
Just like with scikit-learn, you need to add a constant to model = sm.OLS(Y, X)
your predictor variable array to account for the
intercept in statsmodels: # Fit the model
# Predictor variable results = model.fit()
X = df['Height'] # Print the results summary
# Adds a constant term to the predictor, which is
print(results.summary())
required for the intercept
X = sm.add_constant(X)
# Response variable
Y = df['Weight']

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
5. Analysis of the data that contain CM, FLR…

Step 1: Load Data from a Specific Sheet


If you haven't already, make sure Pandas is installed, and then use it to load the
data from the specific sheet containing the variables CM, FLT, PCI, and TFR.
Suppose this sheet is named “CM":
import pandas as pd
# Load the data from a specific sheet
data = pd.read_excel('path_to_your_file.xlsx', sheet_name=‘CM')
Step 2: Verify the Data
It's a good practice to check the first few rows of the DataFrame to ensure that
the data has been loaded correctly:
data.head()
data.describe()

Dr. Pinaki Das Vidyasagar


Dept. of Economics University
5. Analysis of the data that contain CM, FLR…
Step 3. Correlation
Calculate the correlation between any two variables
(say CM and FLR)
data['CM'].corr(data['FLR'])
# Adds a constant term to the
OR predictors
correlation = data['CM'].corr(data['FLR']) X = sm.add_constant(X)
print("Correlation between CM and FLR:", correlation) # Dependent variable
Step 4: Perform Regression Analysis Y = data['CM']
If you need to perform regression analysis where CM # Fit the model
depends on FLR, PCI, and TFR, you can use the
model = sm.OLS(Y, X).fit()
statsmodels library as previously explained:
model.summary()
import statsmodels.api as sm
# Prepare data for regression
# Predictor variables
X = data[['FLR', 'PCI', 'TFR']]
Dr. Pinaki Das Vidyasagar
Dept. of Economics University
Thanks

Dr. Pinaki Das Vidyasagar


Dept. of Economics University

You might also like