0% found this document useful (0 votes)

104 views26 pages

TYCS Practical

Data science involves analyzing data to extract useful information and insights. Common techniques include wrangling, preprocessing, modeling, and visualizing data. This document discusses concepts like regression, clustering, principal component analysis and how to apply them in Python.

Uploaded by

latestfullmovies74

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views26 pages

TYCS Practical

Uploaded by

latestfullmovies74

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data science

Vikas College Of Arts, Science and Commerce Page 1

INDEX

Sr
Title Date Sign
No

1 Introduction to Excel

2 Data Frames and Basic Data Pre-processing

3 Feature Scaling and Dummification

4 Hypothesis Testing

5 ANOVA (Analysis of Variance)

6 Regression and Its Types

7 Logistic Regression and Decision Tree

8 K-Means Clustering

9 Principal Component Analysis (PCA)

10 Data Visualization and Storytelling

Vikas College Of Arts, Science and Commerce Page 2

PRACTICAL 1
Introduction to Excel
A. Perform conditional formatting on a dataset using various criteria.

Steps
Step 1: Go to conditional formatting > Greater Than

Step 2: Enter the greater than filter value for example 2000.

Vikas College Of Arts, Science and Commerce Page 3

Step 3: Go to Data Bars > Solid Fill in conditional formatting.

B. Create a pivot table to analyse and summarize data.

Steps
Step 1: select the entire table and go to Insert tab PivotChart > Pivotchart Step 2:
Select “New worksheet” in the create pivot chart window.

Vikas College Of Arts, Science and Commerce Page 4

Step 3: Select and drag attributes in the below boxes.

C. Use VLOOKUP function to retrieve information from a different worksheet or table. Steps:
Step 1: click on an empty cell and type the following command.
=VLOOKUP(B3, B3:D3,1, TRUE)

Vikas College Of Arts, Science and Commerce Page 5

D. Perform what-if analysis using Goal Seek to determine input values for desired output.
Steps-
Step 1: In the Data tab go to the what if analysis>Goal seek.

Step 2: Fill the information in the window accordingly and click ok.

Vikas College Of Arts, Science and Commerce Page 6

Vikas College Of Arts, Science and Commerce Page 7
PRACTICAL 2
Data Frames and Basic Data Pre-processing
A. Read data from CSV and JSON files into a data frame.
B. Perform basic data pre-processing tasks such as handling missing values and outliers. Code:
import pandas as pd

# Reading CSV file into DataFrame

df = pd.read_csv("[Link]")
print("Our dataset:")
print(df)

# Reading JSON file into DataFrame

data = pd.read_json("[Link]")
print(data)

# Displaying the first 10 rows of the DataFrame

[Link](10)

# Filling missing values with 0

print("Dataset after filling NA values with 0:")
df2 = [Link](value=0)
print(df2)

# Dropping rows with any missing values

print("Dataset after dropping NA values:")
[Link](inplace=True)
print(df)

Vikas College Of Arts, Science and Commerce Page 8

C. Manipulate and transform data using functions like filtering, sorting, and grouping Code:
import pandas as pd

# Reading CSV file into DataFrame

df = pd.read_csv("[Link]")

# Filtering data based on a condition (e.g., age greater than 25)

filtered_df = df[df["age"] > 25]

# Sorting data based on a column (e.g., sorting by age in descending order)

sorted_df = df.sort_values(by="age", ascending=False)

# Grouping data based on a column and applying an aggregation function (e.g., finding the average age per
city)
grouped_df = [Link]("city").agg({"age": "mean"})

# Displaying the filtered DataFrame

print("Filtered DataFrame:")
print(filtered_df)

# Displaying the sorted DataFrame

print("\nSorted DataFrame:")
print(sorted_df)

# Displaying the grouped DataFrame

print("\nGrouped DataFrame:")
print(grouped_df)

Vikas College Of Arts, Science and Commerce Page 9

PRACTICAL 3
Feature Scaling and Dummification
A. Apply feature-scaling techniques like standardization and normalization to numerical
features.

Code:
# Standardization and normalization import pandas as pd
import numpy as np
from [Link] import Normalizer
from [Link] import StandardScaler

print("printing few data")

df = pd.read_csv("D:\TYCS\Data Science\[Link]")
print([Link]())

print("Max values")
max_vals = [Link]([Link](df))
print(max_vals)
print((df - max_vals) / max_vals)

print("Normalization")
scaler = Normalizer()
scaled_data = scaler.fit_transform(df)
scaled_df = [Link](scaled_data, columns=[Link])
print(scaled_df.head())

print("Standardization")
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
scaled_df = [Link](scaled_data, columns=[Link])
print(scaled_df.head())

Vikas College Of Arts, Science and Commerce Page 10

Vikas College Of Arts, Science and Commerce Page 11
B. Perform feature Dummification to convert categorical variables into numerical
representations.
Code:

import pandas as pd
data = pd.read_csv("[Link]")
categorical_features = data.select_dtypes(include="object")
dummies = pd.get_dummies(categorical_features)
data = [Link]([data, dummies], axis=1)
[Link](categorical_features, axis=1, inplace=True)
data.to_csv("[Link]")

Vikas College Of Arts, Science and Commerce Page 12

Practical 4 Hypothesis
Testing
Conduct a hypothesis test using appropriate statistical tests (e.g., t-test, chi-square test) # t-test
import numpy as np
import [Link] as stats

[Link](42)
scoreA = [Link](loc=70,scale=10,size=30)
scoreB = [Link](loc=75,scale=10,size=30)

t_stat,pvalue = stats.ttest_ind(scoreA,scoreB)
print(f"T-Statistics: {t_stat}\nP-Value: {pvalue}")

alpha = 0.05
if pvalue < alpha:
print("Reject the null hypothesis. There is a significant difference in exam scores.")
else:
print("Fail to reject the null hypothesis. There is no significant difference in exam scores.")

Output:

Chi-test
import numpy as np
import [Link] as stats
observed_data = [Link]([[25, 15], [20, 40]])
chi2, pvalue, dof, expected = stats.chi2_contingency(observed_data)
print(f'Chi-Square Statistic: {chi2}\nPvalue: {pvalue}\nDegrees of Freedom: {dof}\nExpected
frequency:\n{expected}')
alpha = 0.05
if pvalue < alpha:
print("Reject the null hypothesis. There is a significant association between gender and job satisfaction.")
else:
print("Fail to reject the null hypothesis. Gender and job satisfaction are independent.")
Output:

Vikas College Of Arts, Science and Commerce Page 13

Practical 5
ANOVA (Analysis of Variance)
Perform one-way ANOVA to compare means across multiple groups.
from [Link] import f_oneway

# Define sample data for each group

group1 = [15, 20, 25, 30, 35]

group2 = [10, 18, 22, 28, 32]

group3 = [12, 16, 20, 24, 28]

f_statistic, p_value = f_oneway(group1, group2, group3)

print("One-way ANOVA results:")

print("F-statistic:", f_statistic)

print("P-value:", p_value)

alpha = 0.05

if p_value < alpha:

print(

"Reject null hypothesis: There are significant differences between the means of the groups."

else:

print(

"Fail to reject null hypothesis: There are no significant differences between the means of the groups."

Output:-

Vikas College Of Arts, Science and Commerce Page 14

Practical 6
Regression and its Types.
import numpy as np
import [Link] as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from [Link] import mean_squared_error, r2_score

# Dependent variable (predictor)

X = [Link]([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
# Independent variable (predictor)
y = [Link]([[7], [9], [11], [13], [15], [17], [19], [21], [23], [25]])
# Dependent variable (response)

# Splitting the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)

# Simple Linear Regression

model = LinearRegression()
[Link](X_train, y_train) # Fitting the model

# Coefficients
print("Intercept:", model.intercept_[0])
print("Coefficient:", model.coef_[0][0])

# Predictions
y_pred = [Link](X_test)

# Model Evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error:", mse)
print("R-squared:", r2)

Vikas College Of Arts, Science and Commerce Page 15

# Plotting the regression line
[Link](X_test, y_test, color="blue")
[Link](X_test, y_pred, color="red")
[Link]("Simple Linear Regression")
[Link]("Independent Variable (X)")
[Link]("Dependent Variable (y)")
[Link]()

Output:

Vikas College Of Arts, Science and Commerce Page 16

Practical 7
Logistic Regression and Decision Tree
import numpy as np
import [Link] as plt
from [Link] import make_blobs
from [Link] import KMeans
from [Link] import silhouette_score

# Generate sample data

X, _ = make_blobs(n_samples=300, centers=5, cluster_std=0.60, random_state=0)

# Determine the optimal number of clusters using the silhouette score

silhouette_scores = []
for k in range(2, 11):
kmeans = KMeans(n_clusters=k, random_state=0).fit(X)
score = silhouette_score(X, kmeans.labels_)
silhouette_scores.append(score)

# Plot the silhouette scores

[Link](range(2, 11), silhouette_scores, marker="o")
[Link]("Number of clusters")
[Link]("Silhouette Score")
[Link]("Silhouette Score for Optimal Number of Clusters")
[Link]()

# Choose the optimal number of clusters based on the silhouette score

optimal_k = silhouette_scores.index(max(silhouette_scores)) + 2

# Apply K-Means clustering with the optimal number of clusters

kmeans = KMeans(n_clusters=optimal_k, random_state=0).fit(X)

# Visualize the clustering results

[Link](X[:, 0], X[:, 1], c=kmeans.labels_, cmap="viridis", s=50, alpha=0.7)
[Link](
kmeans.cluster_centers_[:, 0],
kmeans.cluster_centers_[:, 1],
Vikas College Of Arts, Science and Commerce Page 17
s=200,
c="red",
marker="X",
label="Centroids",
)
[Link]("K-Means Clustering")
[Link]("Feature 1")
[Link]("Feature 2")
[Link]()
[Link]()

# Analyze the cluster characteristics

silhouette_avg = silhouette_score(X, kmeans.labels_)
print(f"Silhouette Score: {silhouette_avg}")
Output:

Vikas College Of Arts, Science and Commerce Page 18

Vikas College Of Arts, Science and Commerce Page 19
Practical 8
K-Means clustering
import pandas as pd
from [Link] import MinMaxScaler
from [Link] import KMeans
import [Link] as plt

# Load data
data = pd.read_csv("[Link]")

# Display the first few rows of the dataset

[Link]()

# Define categorical and continuous features

categorical_features = ["Channel", "Region"]
continuous_features = [
"Fresh",
"Milk",
"Grocery",
"Frozen",
"Detergents_Paper",
"Delicassen",
]

# Descriptive statistics for continuous features

data[continuous_features].describe()

# Convert categorical features into dummy variables

for col in categorical_features:
dummies = pd.get_dummies(data[col], prefix=col)
data = [Link]([data, dummies], axis=1)
[Link](col, axis=1, inplace=True)

Vikas College Of Arts, Science and Commerce Page 20

# Display the first few rows of the updated dataset
[Link]()

# Normalize the data

mms = MinMaxScaler()
data_transformed = mms.fit_transform(data)

# Calculate the sum of squared distances for different values of k

sum_of_squared_distances = []
K = range(1, 15)
for k in K:
km = KMeans(n_clusters=k)
[Link](data_transformed)
sum_of_squared_distances.append(km.inertia_)

# Plot the elbow method graph

[Link](K, sum_of_squared_distances, "bx-")
[Link]("Number of Clusters (k)")
[Link]("Sum of Squared Distances")
[Link]("Elbow Method for Optimal k")
[Link]()

Output:

Vikas College Of Arts, Science and Commerce Page 21

Practical 9
Principal Component Analysis (PCA)
import pandas as pd
from [Link] import load_iris
from [Link] import PCA
import [Link] as plt

# Load the Iris dataset

iris = load_iris()
X = [Link]
y = [Link]
target_names = iris.target_names

# Perform PCA
pca = PCA(n_components=2) # Specify the number of components (dimensions)
X_r = pca.fit_transform(X)

# Create a DataFrame for visualization

df = [Link](data=X_r, columns=['PC1', 'PC2'])
df['target'] = y

# Plot the data

[Link](figsize=(8, 6))
colors = ['navy', 'turquoise', 'darkorange']
lw = 2

for color, i, target_name in zip(colors, [0, 1, 2], target_names):

[Link]([Link][df['target'] == i, 'PC1'], [Link][df['target'] == i, 'PC2'], color=color, alpha=.8, lw=lw,
label=target_name)

[Link]('PCA of IRIS dataset')

[Link](loc='best', shadow=False, scatterpoints=1)
[Link]('Principal Component 1')
[Link]('Principal Component 2')
[Link]()

Output:

Vikas College Of Arts, Science and Commerce Page 22

Vikas College Of Arts, Science and Commerce Page 23
Practical 10
Data Visualization and Storytelling

import pandas as pd
import [Link] as plt
import seaborn as sns

# Load the dataset

# Assume '[Link]' contains your dataset
df = pd.read_csv("[Link]")

# Perform data analysis

# Example: Calculate summary statistics
summary_stats = [Link]()

# Create meaningful visualizations

# Example: Plot a histogram of a numerical variable
[Link](figsize=(8, 6))
[Link](data=df, x="numerical_variable", bins=20, kde=True)
[Link]("Histogram of Numerical Variable")
[Link]("Numerical Variable")
[Link]("Frequency")
[Link]()

# Example: Plot a bar chart of a categorical variable

[Link](figsize=(8, 6))
[Link](data=df, x="categorical_variable", palette="viridis")
[Link]("Bar Chart of Categorical Variable")
[Link]("Categories")
[Link]("Count")
[Link](rotation=45)
[Link]()

# Present findings and insights in a clear and concise manner

# Example: Use Markdown to format text for presentation
print("# Data Analysis and Visualization Report\n")
print("## Summary Statistics:\n")
print(summary_stats)
print("\n## Insights:\n")
print(
"- The histogram shows that the distribution of the numerical variable is approximately normal."
)
print(
"- The bar chart indicates that category A is the most frequent in the categorical variable."
)
print(
"- The scatterplot suggests a positive correlation between numerical variables 1 and 2, with different
categories showing distinct patterns.\n"

Vikas College Of Arts, Science and Commerce Page 24

)

Output:

Vikas College Of Arts, Science and Commerce Page 25

Vikas College Of Arts, Science and Commerce Page 26

Index: SR. NO. Practical Name Date of Perform NO. Sign
No ratings yet
Index: SR. NO. Practical Name Date of Perform NO. Sign
28 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
Data Science Practicals
No ratings yet
Data Science Practicals
47 pages
Omkar
No ratings yet
Omkar
37 pages
Lab Experiment 4 - AI
No ratings yet
Lab Experiment 4 - AI
7 pages
Data Science Practicals
No ratings yet
Data Science Practicals
40 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
ML
No ratings yet
ML
21 pages
Data Mining with Python Lab Guide
No ratings yet
Data Mining with Python Lab Guide
39 pages
Data Analysis for Beginners
No ratings yet
Data Analysis for Beginners
8 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Statistics IMP Questions and Answers
No ratings yet
Statistics IMP Questions and Answers
23 pages
DATA SCIENCE iNTERVIEW QUESTION
No ratings yet
DATA SCIENCE iNTERVIEW QUESTION
42 pages
ML Combined
No ratings yet
ML Combined
254 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
DADV - Lab - Subject - 303105315
No ratings yet
DADV - Lab - Subject - 303105315
35 pages
(Feature Engineering) (Extended-Cheatsheet)
100% (1)
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Saurabh
No ratings yet
Saurabh
22 pages
Data-Analytics-Manual Lab G.anill Kumar
No ratings yet
Data-Analytics-Manual Lab G.anill Kumar
23 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
Top 90+ Data Science Interview Questions and Answers (2024)
No ratings yet
Top 90+ Data Science Interview Questions and Answers (2024)
38 pages
Data Science and Machine Learning - Interview Questions
No ratings yet
Data Science and Machine Learning - Interview Questions
185 pages
2 DataPreProcessing Code
No ratings yet
2 DataPreProcessing Code
46 pages
Hands On Machine Learning, End-to-End Machine Learning Project Notes
No ratings yet
Hands On Machine Learning, End-to-End Machine Learning Project Notes
10 pages
Medium Com Sarowar Saurav10 20 Advanced Statistical Approaches Every Data Scientist Should Know Ccc70ae4df28
No ratings yet
Medium Com Sarowar Saurav10 20 Advanced Statistical Approaches Every Data Scientist Should Know Ccc70ae4df28
15 pages
Data Mining Lab Manual CSE VII Sem
No ratings yet
Data Mining Lab Manual CSE VII Sem
63 pages
MLP Regressor with Sklearn on Wine Data
No ratings yet
MLP Regressor with Sklearn on Wine Data
10 pages
Dadv Manual
No ratings yet
Dadv Manual
35 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
Python Data Science Cheat Sheet
100% (2)
Python Data Science Cheat Sheet
6 pages
R Basics for Beginners
No ratings yet
R Basics for Beginners
24 pages
Excel Data Analysis and Preprocessing Guide
No ratings yet
Excel Data Analysis and Preprocessing Guide
42 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Machine Learning Lab Experiments Guide
No ratings yet
Machine Learning Lab Experiments Guide
47 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
Python Data Science Cheat Sheet
0% (1)
Python Data Science Cheat Sheet
3 pages
Statistics Consulting Overview
100% (1)
Statistics Consulting Overview
44 pages
Data Preprocessing
No ratings yet
Data Preprocessing
56 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Student Performance Analysis and Prediction 2.3
No ratings yet
Student Performance Analysis and Prediction 2.3
19 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
ML Updated File
No ratings yet
ML Updated File
36 pages
ML LAB Mannual - Index
No ratings yet
ML LAB Mannual - Index
29 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
FOUND. DATA SCIENCE Practical
No ratings yet
FOUND. DATA SCIENCE Practical
15 pages
TOBo ML
No ratings yet
TOBo ML
120 pages
Ric BNB Manual For MSC It Part 1, Sem-1
No ratings yet
Ric BNB Manual For MSC It Part 1, Sem-1
53 pages
ADS LAB Merged
No ratings yet
ADS LAB Merged
86 pages
Python Data Preprocessing & Regression
No ratings yet
Python Data Preprocessing & Regression
68 pages
Data Mining Lab Manual 2 2
No ratings yet
Data Mining Lab Manual 2 2
63 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
22 pages
EV Charger DC-DC Converter Design
No ratings yet
EV Charger DC-DC Converter Design
21 pages
03 Evaluating External Environment
No ratings yet
03 Evaluating External Environment
19 pages
MIBM Programme Structure
No ratings yet
MIBM Programme Structure
50 pages
Hazardous Waste Management in Barito Kuala
No ratings yet
Hazardous Waste Management in Barito Kuala
6 pages
Vocabulary
No ratings yet
Vocabulary
7 pages
Dmitriy Pustovalov: Sales Professional Profile
No ratings yet
Dmitriy Pustovalov: Sales Professional Profile
2 pages
Phenguard 930: Marine Coating Guide
No ratings yet
Phenguard 930: Marine Coating Guide
24 pages
Roof Beam Design Calculation
No ratings yet
Roof Beam Design Calculation
11 pages
Quadrilateral in English (Answer Bold)
No ratings yet
Quadrilateral in English (Answer Bold)
11 pages
Assessment For Learning PDF
No ratings yet
Assessment For Learning PDF
29 pages
Lab Sheet Dea 2342 - 1
100% (1)
Lab Sheet Dea 2342 - 1
6 pages
Re Ection and Peer Assessment To Promote Self-Directed Learning in Higher Education
No ratings yet
Re Ection and Peer Assessment To Promote Self-Directed Learning in Higher Education
13 pages
Capstone Template
No ratings yet
Capstone Template
9 pages
Overview of the 73 Bible Books
No ratings yet
Overview of the 73 Bible Books
36 pages
Victoria Kaspi
No ratings yet
Victoria Kaspi
4 pages
2023 Paper Referring To The Work by J. D. Littlejohn and C. J. Bruce (1985)
No ratings yet
2023 Paper Referring To The Work by J. D. Littlejohn and C. J. Bruce (1985)
20 pages
Al Mann - Memo Motion PDF
No ratings yet
Al Mann - Memo Motion PDF
20 pages
16 Chemical Equilibrium Practice Test
No ratings yet
16 Chemical Equilibrium Practice Test
2 pages
A.K. Ramanujan's Poetic Exploration of Rivers
100% (1)
A.K. Ramanujan's Poetic Exploration of Rivers
4 pages
Unit 3 Active-Passive Voice
No ratings yet
Unit 3 Active-Passive Voice
13 pages
Syllabus For The Post of Lecturer Mathematics in School Education Department 13 - 05 - 2025
No ratings yet
Syllabus For The Post of Lecturer Mathematics in School Education Department 13 - 05 - 2025
3 pages
Herltage Tourism in Sufi Shrines in South Bengal
No ratings yet
Herltage Tourism in Sufi Shrines in South Bengal
15 pages
McCabe - 2011 - An Introduction To Linguistics
No ratings yet
McCabe - 2011 - An Introduction To Linguistics
15 pages
SAP CPI Latest Course Content
No ratings yet
SAP CPI Latest Course Content
4 pages
D-STAR Handheld Radio Selection Chart: Full-Featured Radios Icom Has The Right Radio For The Right Job
No ratings yet
D-STAR Handheld Radio Selection Chart: Full-Featured Radios Icom Has The Right Radio For The Right Job
2 pages
Top Ten Most Populated Nations 2023
No ratings yet
Top Ten Most Populated Nations 2023
2 pages
Practice Questions Leadership PDF
No ratings yet
Practice Questions Leadership PDF
8 pages
Pradhan Mantri Kaushal Vikas Yojana 4.0
No ratings yet
Pradhan Mantri Kaushal Vikas Yojana 4.0
3 pages
Manual For The Internship Coordinator
100% (1)
Manual For The Internship Coordinator
15 pages
Eastern Samar State University Borongan City 6800 09496214915
No ratings yet
Eastern Samar State University Borongan City 6800 09496214915
9 pages

TYCS Practical

Uploaded by

TYCS Practical

Uploaded by

Data science

Vikas College Of Arts, Science and Commerce Page 1

2 Data Frames and Basic Data Pre-processing

3 Feature Scaling and Dummification

5 ANOVA (Analysis of Variance)

6 Regression and Its Types

7 Logistic Regression and Decision Tree

9 Principal Component Analysis (PCA)

10 Data Visualization and Storytelling

Vikas College Of Arts, Science and Commerce Page 2

Vikas College Of Arts, Science and Commerce Page 3

B. Create a pivot table to analyse and summarize data.

Vikas College Of Arts, Science and Commerce Page 4

Vikas College Of Arts, Science and Commerce Page 5

Vikas College Of Arts, Science and Commerce Page 6

# Reading CSV file into DataFrame

# Reading JSON file into DataFrame

# Displaying the first 10 rows of the DataFrame

# Filling missing values with 0

# Dropping rows with any missing values

Vikas College Of Arts, Science and Commerce Page 8

# Reading CSV file into DataFrame

# Filtering data based on a condition (e.g., age greater than 25)

# Sorting data based on a column (e.g., sorting by age in descending order)

# Displaying the filtered DataFrame

# Displaying the sorted DataFrame

# Displaying the grouped DataFrame

Vikas College Of Arts, Science and Commerce Page 9

print("printing few data")

Vikas College Of Arts, Science and Commerce Page 10

Vikas College Of Arts, Science and Commerce Page 12

Vikas College Of Arts, Science and Commerce Page 13

# Define sample data for each group

group1 = [15, 20, 25, 30, 35]

group2 = [10, 18, 22, 28, 32]

group3 = [12, 16, 20, 24, 28]

f_statistic, p_value = f_oneway(group1, group2, group3)

print("One-way ANOVA results:")

if p_value < alpha:

Vikas College Of Arts, Science and Commerce Page 14

# Dependent variable (predictor)

# Splitting the data into training and testing sets

# Simple Linear Regression

Vikas College Of Arts, Science and Commerce Page 15

Vikas College Of Arts, Science and Commerce Page 16

# Generate sample data

# Determine the optimal number of clusters using the silhouette score

# Plot the silhouette scores

# Choose the optimal number of clusters based on the silhouette score

# Apply K-Means clustering with the optimal number of clusters

# Visualize the clustering results

# Analyze the cluster characteristics

Vikas College Of Arts, Science and Commerce Page 18

# Display the first few rows of the dataset

# Define categorical and continuous features

# Descriptive statistics for continuous features

# Convert categorical features into dummy variables

Vikas College Of Arts, Science and Commerce Page 20

# Normalize the data

# Calculate the sum of squared distances for different values of k

# Plot the elbow method graph

Vikas College Of Arts, Science and Commerce Page 21

# Load the Iris dataset

# Create a DataFrame for visualization

# Plot the data

for color, i, target_name in zip(colors, [0, 1, 2], target_names):

[Link]('PCA of IRIS dataset')

Vikas College Of Arts, Science and Commerce Page 22

# Load the dataset

# Perform data analysis

# Create meaningful visualizations

# Example: Plot a bar chart of a categorical variable

# Present findings and insights in a clear and concise manner

Vikas College Of Arts, Science and Commerce Page 24

Vikas College Of Arts, Science and Commerce Page 25

You might also like