Data Analytics Lab - Introduction

Uploaded by

Govindan G.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views43 pages

Data Analytics Lab - Introduction

Uploaded by

Govindan G.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 43

21CS1711

DATA SCIENCE
AND
ANALYTICS LABORATORY
Exercise 1:Download, install and explore the features of NumPy,
SciPy, Jupiter, Stats models and Pandas packages.
• Reading data from text file, Excel and the web.
• Exploring various commands for doing descriptive analytics
on Iris dataset.
Before We Start
• Since Anaconda Navigator already comes preloaded with most key
data science packages (including NumPy, Pandas, SciPy, Statsmodels,
Jupyter), we can skip the installation step and jump right into the
hands-on teaching session.
STEP 1: Launch Jupyter
Notebook via Anaconda
Navigator
•Open Anaconda Navigator
•Click Launch under Jupyter Notebook
•A browser window will open → Click New → Python 3 (ipykernel)
Run a quick test: for testing the already existing packages

import numpy as np
import pandas as pd
import scipy

print(np.__version__)
print(pd.__version__)
print(scipy.__version__)
Packages
• A) NumPy (numpy)
“NumPy” stands for Numerical Python. It helps in fast mathematical
calculations. Think of it as a super-powered calculator!”
Example:
import numpy as np
a = np.array([1, 2, 3])
print("Mean:", np.mean(a))
OUTPUT?
Mean :2.0
Packages
B) SciPy (scipy)
• “SciPy” is used for scientific computing — especially statistics, signals, optimization,
and more.”
• Example
from scipy import stats
group1 = [60, 65, 70]
group2 = [80, 85, 90]
t_stat, p_val = stats.ttest_ind(group1, group2)
print("t-statistic:", t_stat)
Output:
t-statistic: -4.898979485566357
NumPy vs SciPy
from scipy import stats import scipy.stats as sta

import numpy as np
group1 = [60, 65, 70]
data = [1, 2, 3, 4, 5]
group2 = [80, 85, 90]
mean = np.mean(data) t_stat, p_val = stats.ttest_ind(group1, group2)
print("Mean:",mean) print("t-statistic:", t_stat)
Output: print("p-value:", p_val)
Mean :3.0 Output:
t-statistic: -4.898979485566357
p-value: 0.008049893100837717
Use NumPy (np) for general numerical operations and basic stats.Use from scipy import stats when
you need statistical functions that go beyond what NumPy offers — like distributions, hypothesis
testing, and confidence intervals.
c) Jupyter Notebook(jupyter)
• “Jupyter is an interactive environment where we write code, see
output instantly, and explain with text, images, or charts.”
• How to launch from Anaconda Navigator
• Markdown and Code examples
• Save as .ipynb file
Each flower has 4 features:
Feature Description Unit
sepal length Length of the sepal cm
sepal width Width of the sepal cm
petal length Length of the petal cm
petal width Width of the petal cm

Target (Label): Dataset Size:

The species of the flower — a categorical value: •150 samples
•0 → Setosa •50 samples per species
•1 → Versicolor •No missing values
•2 → Virginica
D) Pandas (pandas)
• “Pandas lets us read, clean, and analyze tabular data easily. It gives us
DataFrames, like Excel in Python.”
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-
data/master/iris.csv")
df.head()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
E) Statsmodels (statsmodels)
• “Statsmodels is used for running statistical models like regression,
ANOVA, and time series.”
import statsmodels.api as sm
X = df['sepal_length']
y = df['petal_length']
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
print(model.summary())
What are we trying to do?
• To predict petal_length based on
Meaning in Simple
sepal_length using a straight- Term coef
Words
This is the starting value
line formula (intercept). If
sepal_length = 0,
const -7.1014
• Petal Length=something+someth petal_length would be -
7.1 (just part of the
formula).
ing×Sepal Length For every 1 unit increase
in sepal_length,
sepal_length 1.8584
• The computer finds the best line petal_length increases
by about 1.86 units.
that fits the data using a method
called OLS
What are we trying to do?

Column Meaning p-value Interpretation

How much the value might vary.
std err Smaller is better. Very strong evidence against
< 0.01
A score that tells if the number
the null → Highly significant
t is far from zero Enough evidence to reject the
< 0.05
`P> t null → Statistically significant
0.025–0.975 We’re 95% sure the real number Not enough evidence →
lies in this range. > 0.05 Difference might be due to
chance
What we Should Focus On

• Look at the coef (values of the • Look at the P>|t| value:If it's less
equation): than 0.05, it’s important
• Petal Length=−7.1+1.86×Sepal Le (statistically significant).
ngth • Both values here are 0.000, so
they are very important.
Petal Length=−7.1+1.86×Sepal Length
So if a flower has sepal_length = 6cm: • Look at R-squared = 0.760:This
Petal Length≈−7.1+1.86×6=3.06 cm means 76% of the change in
petal length can be explained
just by knowing sepal length.
SUMMARY
Package Use
NumPy Fast math, arrays, stats
SciPy Statistical tests, scientific computing
Jupyter Interactive code notebook
Pandas Data manipulation and analysis
Statsmodels Statistical modeling like regression
STEP 4: Explore the Dataset
(Descriptive Analytics)
• Show basic details : df.info()

• Summary statistics : df.describe()

• Unique species and counts: df['species'].value_counts()

• Mean values by species: df.groupby('species').mean()

• df.corr(numeric_only=True)
df.shape – no. of rows and columns will be displayed

df.duplicated() - indicating whether a row is a duplicate of a previous row.

STEP 5: Use NumPy for Numerical
Operations
# Mean of sepal length
np.mean(df['sepal_length'])

# Standard deviation of petal width

np.std(df['petal_width'])
STEP 6: Use SciPy for Statistical Test
Example
# T-test between petal lengths of setosa and versicolor
setosa = df[df['species'] == 'setosa']['petal_length']
versicolor = df[df['species'] == 'versicolor']['petal_length']

stats.ttest_ind(setosa, versicolor)
STEP 7: Use Statsmodels for
Regression
# Simple Linear Regression: Predict petal_length using sepal_length
X = df['sepal_length']
y = df['petal_length']
X = sm.add_constant(X) # Adds intercept

model = sm.OLS(y, X).fit()

print(model.summary())
STEP 8: Make It Visual
import seaborn as sns
import matplotlib.pyplot as plt

sns.pairplot(df, hue='species')
plt.show()
• variance =
Basic Descriptions df['DiabetesPedigreeFunction'].var()
• print(variance)
• frequency =
df[‘column_name’].value_counts() • std_dev =
• print(frequency) df['DiabetesPedigreeFunction'].std()
• print(std_dev)
• mean =
df['DiabetesPedigreeFunction'].mean() • skewness =
• print(mean) df['DiabetesPedigreeFunction'].skew()
• print(skewness)
• median =
df['DiabetesPedigreeFunction'].median() • kurtosis =
• print(median) df['DiabetesPedigreeFunction'].kurtosis()
• print(kurtosis)
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load a dataset
df = sns.load_dataset("iris") # Example dataset

# Compute the correlation matrix

corr = df.corr(numeric_only=True)

# Create the heatmap

sns.heatmap(corr, annot=True, cmap="coolwarm", linewidths=0.5)

plt.title("Correlation Heatmap")
plt.show()
Data from Text or Excel
• Load from .txt:
df_txt = pd.read_csv(‘myfile.txt', delimiter='\t')
• Load from .xlsx:
df_excel = pd.read_excel(‘Book1.xlsx')
Summary Table
Task Command
Load CSV pd.read_csv(url)
View top rows df.head()
Basic info df.info()
Summary stats df.describe()
Group by df.groupby()
Mean using
np.mean()
NumPy
T-test stats.ttest_ind()
Regression sm.OLS().fit()
Correlation df.corr()
coefficient
sns.heatmap()
Correlation
heatmap
Creating Own Kernel (Virtual
Environment) in Navigator by
Installing using Terminal
1. Open Anaconda Navigator
• Click Start → search for Anaconda Navigator → open it.
• Wait for it to load fully.
2. Create a New Environment
(Recommended)
• Creating a new environment keeps things clean.
• Click Environments (left side).
• Click Create (bottom).
• Name it: data_analysis_env
• Choose: Python 3.10 or 3.11
• Click Create (wait a bit).
3. Install Required Packages
Let’s install the following:
• numpy
• scipy
• pandas
• statsmodels
• jupyter
Option A: Using GUI

• Select your environment (data_analysis_env)

• Click Open Terminal (right side)
• In the terminal, type:
conda install numpy scipy pandas statsmodels jupyter
Option B: Use Environment Tab

• Click Channels > conda-forge

• Use Search Bar for each library (e.g., numpy)
• Select → Apply
Creating Own Kernel (Virtual
Environment) in Navigator by
Installing using GUI
4. Launch Jupyter Notebook
• From Home tab, select the environment dropdown (top right).
• Make sure data_analysis_env is selected.
• Click Launch → Jupyter Notebook
This will open a browser window (localhost:8888).
Click New → Python 3 to open a new notebook.

Import Pandas As PD From Pandas - Tools.plotting Import Scatter - Matrix %matplotlib Inline
No ratings yet
Import Pandas As PD From Pandas - Tools.plotting Import Scatter - Matrix %matplotlib Inline
2 pages
EDA of Iris Dataset in Python
No ratings yet
EDA of Iris Dataset in Python
12 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
Lab Manual
No ratings yet
Lab Manual
32 pages
Experimenting With Data Analysis Packages and Statistical Operations
No ratings yet
Experimenting With Data Analysis Packages and Statistical Operations
18 pages
Ai Record Programs
No ratings yet
Ai Record Programs
34 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
74 pages
Lab Manual
No ratings yet
Lab Manual
7 pages
ML LabReport Final Index Edited
No ratings yet
ML LabReport Final Index Edited
35 pages
Da Lab File
No ratings yet
Da Lab File
33 pages
R Data Preprocessing & Analysis
No ratings yet
R Data Preprocessing & Analysis
7 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
EX - No: 1 Date:: Download Install Explore The Features of Numpy, Scipy, Jupiter, Statsmodels and Pandas Packages
No ratings yet
EX - No: 1 Date:: Download Install Explore The Features of Numpy, Scipy, Jupiter, Statsmodels and Pandas Packages
38 pages
CS-3361-Data-science-lab Manual
No ratings yet
CS-3361-Data-science-lab Manual
36 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
FDS LAB Record Print
No ratings yet
FDS LAB Record Print
45 pages
Statistics
No ratings yet
Statistics
163 pages
23CS302 - Dslab - Experiment 1
No ratings yet
23CS302 - Dslab - Experiment 1
5 pages
Hypothesis Testing - Cheatsheet
No ratings yet
Hypothesis Testing - Cheatsheet
10 pages
Ds Practical
No ratings yet
Ds Practical
25 pages
Goal Based Investment
No ratings yet
Goal Based Investment
10 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
I, S, V D: Mporting Ummarizing AND Isualizing ATA
No ratings yet
I, S, V D: Mporting Ummarizing AND Isualizing ATA
18 pages
ML with Python: Data Visualization Guide
No ratings yet
ML with Python: Data Visualization Guide
7 pages
ML Lab - Abbs
No ratings yet
ML Lab - Abbs
23 pages
Assigntment 3 Python Lab
No ratings yet
Assigntment 3 Python Lab
1 page
Data Science Programs
No ratings yet
Data Science Programs
11 pages
Data Sceince Lab Manual
No ratings yet
Data Sceince Lab Manual
64 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
Statistics in Python
No ratings yet
Statistics in Python
19 pages
Beginner's Guide to Pandas in Python
No ratings yet
Beginner's Guide to Pandas in Python
47 pages
Python Libraries for Time Series Analysis
No ratings yet
Python Libraries for Time Series Analysis
13 pages
AD3411
No ratings yet
AD3411
28 pages
Ex No4
No ratings yet
Ex No4
3 pages
Lab 5 &6
No ratings yet
Lab 5 &6
6 pages
Importing Data from R Datasets
No ratings yet
Importing Data from R Datasets
2 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
Cheatsheetforstatistics
No ratings yet
Cheatsheetforstatistics
4 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Data Science Lab Manual Overview
No ratings yet
Data Science Lab Manual Overview
74 pages
DS Journal - Final
No ratings yet
DS Journal - Final
37 pages
Data Science Practical Certificate
No ratings yet
Data Science Practical Certificate
25 pages
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
No ratings yet
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
13 pages
Lab Manual ML R22
No ratings yet
Lab Manual ML R22
27 pages
Sandeep ML Record
No ratings yet
Sandeep ML Record
31 pages
Machine Learning - Lab Record
No ratings yet
Machine Learning - Lab Record
43 pages
DSR LAB MANUAL - 10 Programs
No ratings yet
DSR LAB MANUAL - 10 Programs
34 pages
Batch1 Ds
No ratings yet
Batch1 Ds
15 pages
Chi-Square and T-Test Analysis Guide
No ratings yet
Chi-Square and T-Test Analysis Guide
9 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
11 pages
ML N PY Programs
No ratings yet
ML N PY Programs
17 pages
Data Analysis Lab with Python
No ratings yet
Data Analysis Lab with Python
11 pages
25 - Assignment10.ipynb - Colaboratory
No ratings yet
25 - Assignment10.ipynb - Colaboratory
13 pages
EXPERIMENT
No ratings yet
EXPERIMENT
16 pages
Datamining 2
No ratings yet
Datamining 2
54 pages
Creative Writing Module Quarter 2
No ratings yet
Creative Writing Module Quarter 2
74 pages
PALS Case Scenario Testing Checklist
100% (2)
PALS Case Scenario Testing Checklist
12 pages
Jainism and Buddhism
No ratings yet
Jainism and Buddhism
2 pages
Anh 6- Nội Dung Ôn Tập Kiểm Tra Cuối Hk 1
No ratings yet
Anh 6- Nội Dung Ôn Tập Kiểm Tra Cuối Hk 1
5 pages
Moeller 9e Ch02
No ratings yet
Moeller 9e Ch02
19 pages
Grade 5 Olympiad: Answer The Questions
No ratings yet
Grade 5 Olympiad: Answer The Questions
4 pages
Class11th, P&C, B T, Sequence&Series, Worksheet, TSC, PT ddusDV, Aug, 22nd, 2025
No ratings yet
Class11th, P&C, B T, Sequence&Series, Worksheet, TSC, PT ddusDV, Aug, 22nd, 2025
2 pages
50 Common English Phrases
No ratings yet
50 Common English Phrases
1 page
8086 Assembly Questions Answers
No ratings yet
8086 Assembly Questions Answers
16 pages
Lesson Plan Analysis Guide
No ratings yet
Lesson Plan Analysis Guide
1 page
Data Cleaning in Databricks
No ratings yet
Data Cleaning in Databricks
9 pages
2.A - Some Basic Relationships Between Pixels Draft
No ratings yet
2.A - Some Basic Relationships Between Pixels Draft
32 pages
Aws Tagging Best Practices
No ratings yet
Aws Tagging Best Practices
24 pages
In The Lord I'll Be Ever Thankful
No ratings yet
In The Lord I'll Be Ever Thankful
3 pages
Skripsi PDF
No ratings yet
Skripsi PDF
65 pages
Voy A Apagar La Luz Lyric
No ratings yet
Voy A Apagar La Luz Lyric
2 pages
Quick 7
No ratings yet
Quick 7
3 pages
Mindless Reading
No ratings yet
Mindless Reading
3 pages
TedTalk - What Makes A Good Teacher Great
No ratings yet
TedTalk - What Makes A Good Teacher Great
3 pages
Cauchy's Integral Formulas and Infinite Series
No ratings yet
Cauchy's Integral Formulas and Infinite Series
7 pages
Hume - 13 Principal Up Ani Shads
No ratings yet
Hume - 13 Principal Up Ani Shads
555 pages
Wrapper Class Icse Class 10
No ratings yet
Wrapper Class Icse Class 10
8 pages
Book of Abstracts-AKKSHI 2023
No ratings yet
Book of Abstracts-AKKSHI 2023
33 pages
1 ListIV Compilation
No ratings yet
1 ListIV Compilation
1,651 pages
Alphabet & Phonics for All Ages
No ratings yet
Alphabet & Phonics for All Ages
3 pages
Chemical Industries in Mumbai
0% (1)
Chemical Industries in Mumbai
2 pages
Fill The Gaps With The Verb in Brackets Using Either The Going To or Will Form of The Future Tense
No ratings yet
Fill The Gaps With The Verb in Brackets Using Either The Going To or Will Form of The Future Tense
3 pages
Indian Journal of Practical Pediatrics Ijpp Is A Quarterly 1xhcmq6vv5
No ratings yet
Indian Journal of Practical Pediatrics Ijpp Is A Quarterly 1xhcmq6vv5
108 pages
Android Programming Sample Questions
No ratings yet
Android Programming Sample Questions
3 pages
458 3 Cutting Edge Pre Intermediate Workbook With Key 2013 96p Сторінки
No ratings yet
458 3 Cutting Edge Pre Intermediate Workbook With Key 2013 96p Сторінки
2 pages