0% found this document useful (0 votes)

23 views80 pages

ML3 Data Analysis

The document provides a comprehensive overview of developing a machine learning application using Python, focusing on data analysis, manipulation, and visualization techniques with libraries such as NumPy, Matplotlib, and Pandas. It covers key concepts like data cleaning, array operations, statistical functions, and exploratory data analysis (EDA) using the Iris dataset. Additionally, it includes practical examples and code snippets for creating various types of plots and handling data structures.

Uploaded by

ramzanrawal777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views80 pages

ML3 Data Analysis

Uploaded by

ramzanrawal777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 80

LO3 Develop a machine learning

application using an appropriate

programming language or
machine learning tool for solving
a real-world problem
RAJAD SHAKYA
Data Analysis
● process of examining, cleaning, transforming, and
modeling data to discover useful information, draw
conclusions, and support decision-making.

● Data Cleaning (Handling Missing Values, Outlier

Detection)

● Data Transformation (Normalization/Scaling,

Encoding Categorical Variables)
NumPy
● Numerical Python is a powerful library for numerical
computations in Python.

● provides support for arrays, matrices, and a variety

of mathematical functions to operate on these data
structures

● pip install numpy.

NumPy
● import numpy as np

● Np.__version__

● np.info(np.logspace)
NumPy Arrays
● Data Type: Homogeneous (all elements must be of
the same data type, e.g., integers, ﬂoats).

● Performance: Signiﬁcantly faster than lists for

numerical computations

● Efﬁcient use of memory because of homogeneous

data types and contiguous memory allocation.
Creating Arrays
import numpy as np

# Creating arrays from lists

arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([[1, 2, 3], [4, 5, 6]])

print(arr1)
print(arr2)
Creating Arrays
zeros = np.zeros((2, 3)) # Array of zeros
ones = np.ones((2, 3)) # Array of ones
arange = np.arange(0, 10, 2) # Array with a range
linspace = np.linspace(0, 1, 5)
# Array with linearly spaced values
Array Attributes

arr = np.array([[1, 2, 3], [4, 5, 6]])

print("Array Shape:", arr.shape)

print("Array Dimensions:", arr.ndim)
print("Array Size:", arr.size)
print("Array Data Type:", arr.dtype)
Array Indexing and Slicing
arr = np.array([1, 2, 3, 4, 5])
# Indexing
print(arr[0]) # First element
print(arr[-1]) # Last element
# Slicing
print(arr[1:4]) # Elements from index 1 to 3
print(arr[:3]) # First three elements
print(arr[::2]) # Every second element
Array Indexing and Slicing
# Multi-dimensional array indexing
arr2 = np.array([[1, 2, 3], [4, 5, 6]])

print(arr2[0, 1])
# Element at ﬁrst row, second column
print(arr2[:, 1])
# All elements in second column
Array Manipulation
arr = np.array([[1, 2, 3], [4, 5, 6]])

reshaped = arr.reshape((3, 2))

print(reshaped)

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])
concatenated = np.concatenate((arr1, arr2))
print(concatenated)
Array Manipulation
# Split
split = np.split(concatenated, 2)
print(split)

# Flatten
flattened = arr.flatten()
print(flattened)
Array Operations
arr = np.array([1, 2, 3, 4])

# Element-wise operations
print(arr + 2) # Add 2 to each element
print(arr * 3) # Multiply each element by 3
print(arr ** 2) # Square each element
Array Operations

print(np.sum(arr)) # Sum of all elements

print(np.mean(arr)) # Mean of elements
print(np.min(arr)) # Minimum element
print(np.max(arr)) # Maximum element
print(np.std(arr)) # Standard deviation
Statistical and Mathematical Functions
arr = np.array([1, 2, 3, 4, 5])

# Statistical functions
print(np.mean(arr)) # Mean
print(np.median(arr)) # Median
print(np.var(arr)) # Variance

# Mathematical functions
print(np.sin(arr)) # Sine of each element
print(np.log(arr)) # Natural logarithm of each element
print(np.exp(arr)) # Exponential of each element
Broadcasting
arr1 = np.array([1, 2, 3])
arr2 = np.array([[4], [5], [6]])

# Broadcasting addition
result = arr1 + arr2
print(result)
Random Module
# Generating random numbers
random_num = np.random.rand(5)
# 5 random numbers between 0 and 1
print(random_num)
Random Module
# Random integers
random_ints = np.random.randint(1, 10, size=5)
# 5 random integers between 1 and 10
print(random_ints)
Random Module
np.random.seed(10)

a1 = np.random.randint(1,10,(3,3))
a2 = np.random.randint(1,10,(3,3))

np.dot(a1,a2)

a1.T
Linear Algebra
np.linalg.det(A)

I=A*B
# Multiply corresponding elements of A and B
J=A/B
# Divide corresponding elements of A by B
K = np.sqrt(A)
# Square root of each element in A
Aggregations
sum_A = np.sum(A)
mean_A = np.mean(A)
max_A = np.max(A)
min_A = np.min(A)

sum_axis0 = np.sum(A, axis=0)

# Sum along axis 0 (columns)
mean_axis1 = np.mean(B, axis=1)
# Mean along axis 1 (rows)
Cheatsheet
https://s3.amazonaws.com/assets.datacamp.com/blo
g_assets/Numpy_Python_Cheat_Sheet.pdf
More
● np.eye(5)
● x = np.array([[0,1],
[2,3]])

● np.sum(x,axis=1)

● Np.sqrt
● np.exp
Matplotlib
● Matplotlib is a comprehensive library for creating
static, animated, and interactive visualizations in
Python.

● useful for generating plots and charts

● pip install matplotlib

Matplotlib
● import matplotlib.pyplot as plt

● Line Plot
○ plt.plot().

○ x = [1, 2, 3, 4, 5]
○ y = [2, 3, 5, 7, 11]
○
○ plt.plot(x, y)
Matplotlib
● plt.title('Simple Line Plot')
● plt.xlabel('X-axis')
● plt.ylabel('Y-axis')
● plt.show()

● plt.plot(x, y, color='red')
● plt.plot(x, y, color='#FF5733')
Matplotlib
● plt.plot(x, y, linestyle='--') # Dashed line
● plt.plot(x, y, linestyle='-.') # Dash-dot line
● plt.plot(x, y, linestyle=':') # Dotted line

● plt.plot(x, y, marker='o') # Circle marker

● plt.plot(x, y, marker='s') # Square marker
● plt.plot(x, y, marker='^') # Triangle up marker
Matplotlib
● plt.plot(x, y, color='green', linestyle='--',
linewidth=2, marker='o', markersize=8)

● plt.plot(x, y, label='Prime Numbers')

● plt.legend()

●
Matplotlib
● x = [1, 2, 3, 4, 5]
● y = [2, 3, 5, 7, 11]

● plt.scatter(x, y)
● plt.title('Simple Scatter Plot')
● plt.xlabel('X-axis')
● plt.ylabel('Y-axis')
● plt.show()
Matplotlib
● plt.scatter(x, y, s=100)

● plt.scatter(x, y, color='blue')
● plt.scatter(x, y, c='green')
# Using 'c' as shorthand for color
● plt.scatter(x, y, c='#FF5733')
# Using HEX color codes
Matplotlib
● plt.scatter(x, y, marker='^') # Triangle up marker

●
Shortcuts
● plt.plot(X,Y,'r--')
● plt.show()
●
● # ro -> red color marker circle
● # r- -> red color - line
● # r-- -> red color dotted line
Shortcuts
● X = np.arange(-16,16)
● Y = X **3

● plt.plot(X,Y,'b^-',linewidth=4.5)
Shortcuts
● X = np.arange(0,10,0.4)

● plt.plot(X,X,'r--',X,X**2,'bs',X,X**3,'g^')
● plt.show()
Shortcuts
● X = np.arange(0,10,0.4)
●
● plt.plot(X,X,'r--',label="x vs x")
● plt.plot(X,X**2,'bs',label="x vs x**2")
● plt.plot(X,X**3,'g^',label="x vs x**3")
● plt.xlabel("this is value of x")
● plt.ylabel("this is value of y")
● plt.title("graph for x vs x,x2,x3")
● plt.legend()
● plt.grid(True)
● plt.show()
Shortcuts
● - create a graph of equation
(all in one graph as well and individually)

1. Y = 4X^3 + 3X^2
2. Y = 3X^2 + 2X
3. Y = X+5

(Assume any data range using np arange)

Bar Plot
● used to represent categorical data with rectangular bars.
import matplotlib.pyplot as plt
categories = ['A', 'B', 'C', 'D']
values = [10, 15, 7, 10]
plt.bar(categories, values)
plt.title('Basic Bar Plot')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.show()
Bar Plot
● plt.barh(categories, values)

● plt.bar(categories, values,
color='skyblue',
edgecolor='black',
linewidth=1.5,
width=0.6)
Sub Plot
X = np.array([1,2,3,4])

plt.subplot(2,1,1)
plt.plot(X,X**2)

plt.subplot(2,1,2)
plt.plot(X,X**3)
Sub Plot
ﬁg, axs = plt.subplots(2, 2, ﬁgsize=(10, 8))

# First subplot with title and labels

axs[0, 0].plot([1, 2, 3], [4, 5, 6], color='blue', linewidth=2)
axs[0, 1].plot([1, 2, 3], [6, 5, 4], color='green', linewidth=1.5)
axs[0, 1].set_title('Second Plot')
axs[1, 0].bar(['A', 'B', 'C'], [5, 6, 7], color='red')
axs[1, 1].scatter([1, 2, 3], [4, 5, 6], color='purple', marker='o')

plt.tight_layout()
plt.show()
Histogram
● used to represent the distribution of numerical data
by dividing it into bins.
data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5]
plt.hist(data, bins=5) #can also be given in array
plt.title('Basic Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()
Histogram
● plt.hist(data, bins=5, color='lightgreen',
edgecolor='black', linewidth=1.5)

● Line Width: linewidth

● Line Style: linestyle (solid, dashed, etc.)
● Color: color
● Marker: marker (o, s, ^, etc.)
Histogram
● plt.hist(data, bins=5, color='lightgreen',
edgecolor='black', linewidth=1.5)

● Line Width: linewidth

● Line Style: linestyle (solid, dashed, etc.)
● Color: color
● Marker: marker (o, s, ^, etc.)
Box plot
np.random.seed(10)
data = np.random.randn(100)

# Create box plot

plt.boxplot(data)
plt.title('Box Plot')
plt.ylabel('Values')
plt.show()
Pandas
● powerful Python library speciﬁcally designed for
data manipulation and analysis.

● offers two primary data structures:

○ Series (one-dimensional) and
○ DataFrame (two-dimensional)
Series
● single column from a spreadsheet or a labeled array.
● collection of elements all of the same data type
along with an index (labels) to identify each
element.
● import pandas as pd
● data = [10, 20, 30, 40]
● my_series = pd.Series(data, index=['Apple',
'Banana', 'Cherry', 'Mango'])
● print(my_series)
DataFrame
● full spreadsheet or a two-dimensional table.

● consists of multiple Series (columns) with

potentially different data types and a set of labels
(index) for rows.

● List or dictionary based initialization

DataFrame
● df['Name']

● df[['Name', 'City']]

● # Accessing the ﬁrst row

● print(df.iloc[0])
●
● # Accessing rows by index
● print(df.iloc[1:3]) # Rows 1 to 2
DataFrame
● df.loc[row_label, column_label]
● # Accessing rows by label
● print(df.loc[0]) # First row

● # Accessing multiple rows by label

● print(df.loc[1:2]) # Rows 1 to 2

● print(df.loc[:, ['Name', 'Score']])

● df.loc[df['Age'] > 25]
DataFrame
● df.iloc[row_index, column_index]
● Df.iloc[0]

● Df.iloc[0:2]
● df.iloc[0:2, [0, 3]]

● print(df.loc[1:3, ['Name', 'City']])

● print(df.iloc[1:3, [0, 2]])

DataFrame
● df['Score'] = df['Score'] + 5
● # Applying a lambda function to increase age by 1
● df['Age'] = df['Age'].apply(lambda x: x + 1)
● print(df)
● def multiply_by_2(x):
● return x * 2
●
● result = df.apply(multiply_by_2)
● print(result)
DataFrame
● df_sorted = df.sort_values(by='Age')

● df.sort_values(by=['City', 'Score'])

● df.groupby('City')['Score'].mean()
Functions
● print(df.head(2))
● print(df.tail(2))
● df.describe()
● df['City'].value_counts()
● df_dropped = df.drop('Age', axis=1)
● # Dropping a row
● df_dropped = df.drop(2, axis=0)
Functions
● df.groupby('City')['Score'].mean()
● data_with_na = {
● 'A': [1, 2, None, 4],
● 'B': [None, 2, 3, 4]
● }
● df_with_na = pd.DataFrame(data_with_na)
● dropped_na_df = df_with_na.dropna()
● print(dropped_na_df)
● filled_na_df = df_with_na.fillna(0)
● print(filled_na_df)
EDA : Iris Dataset
● df.head()

● Df.shape

● df.info()

● df.describe()

● df.isnull().sum()
EDA : Iris Dataset
● data = df.drop_duplicates(subset ="Species",)

● df.value_counts("Species")

● import seaborn as sns

● import matplotlib.pyplot as plt

● sns.countplot(x='Species', data=df, )
● plt.show()
EDA : Iris Dataset
● import seaborn as sns
● import matplotlib.pyplot as plt
●
● sns.scatterplot(x='SepalLengthCm',
y='SepalWidthCm',
● hue='Species', data=df, )
●
● # Placing Legend outside the Figure
● plt.legend(bbox_to_anchor=(1, 1), loc=2)
● plt.show()
EDA : Iris Dataset
sns.pairplot(df.drop(['Id'], axis = 1),
hue='Species', height=2)
ﬁg, axes = plt.subplots(2, 2, ﬁgsize=(10,10))
axes[0,0].set_title("Sepal Length")
axes[0,0].hist(df['SepalLengthCm'], bins=7)
axes[0,1].set_title("Sepal Width")
axes[0,1].hist(df['SepalWidthCm'], bins=5);
axes[1,0].set_title("Petal Length")
axes[1,0].hist(df['PetalLengthCm'], bins=6);
axes[1,1].set_title("Petal Width")
axes[1,1].hist(df['PetalWidthCm'], bins=6);
EDA : Iris Dataset
data.corr(method='pearson')

sns.heatmap(df.corr(method='pearson').drop(
['Id'], axis=1).drop(['Id'], axis=0),
annot = True);

plt.show()
EDA : Iris Dataset
def graph(y):
sns.boxplot(x="Species", y=y, data=df)
plt.ﬁgure(ﬁgsize=(10,10))
plt.subplot(221)
graph('SepalLengthCm')
plt.subplot(222)
graph('SepalWidthCm')
plt.subplot(223)
graph('PetalLengthCm')
plt.subplot(224)
graph('PetalWidthCm')
plt.show()
EDA : Iris Dataset
df = pd.read_csv('Iris.csv')
sns.boxplot(x='SepalWidthCm', data=df)

Q1 = np.percentile(df['SepalWidthCm'], 25,
interpolation = 'midpoint')

Q3 = np.percentile(df['SepalWidthCm'], 75,
interpolation = 'midpoint')
IQR = Q3 - Q1
print("Old Shape: ", df.shape)
EDA : Iris Dataset
upper = np.where(df['SepalWidthCm'] >= (Q3+1.5*IQR))

# Lower bound
lower = np.where(df['SepalWidthCm'] <= (Q1-1.5*IQR))

# Removing the Outliers

df.drop(upper[0], inplace = True)
df.drop(lower[0], inplace = True)
print("New Shape: ", df.shape)

sns.boxplot(x='SepalWidthCm', data=df)
EDA : Titanic Dataset
● titanic = pd.read_csv(url)
● titanic.head()
● titanic.info()

● titanic.isnull().sum()
● titanic['Age'].fillna(titanic['Age'].median(),
inplace=True)
● titanic.drop(columns=['Cabin'], inplace=True)
EDA : Titanic Dataset
import seaborn as sns
import matplotlib.pyplot as plt
# Set up the plotting environment
sns.set(style='whitegrid')
# Count plot for 'Survived'
plt.figure(figsize=(8, 6))
sns.countplot(x='Survived', data=titanic)
plt.title('Survival Count')
plt.show()
EDA : Titanic Dataset
# Distribution plot for 'Age'
plt.figure(figsize=(8, 6))
sns.histplot(titanic['Age'], kde=True, bins=30)
plt.title('Age Distribution')
plt.show()
# Count plot for 'Pclass'
plt.figure(figsize=(8, 6))
sns.countplot(x='Pclass', data=titanic)
plt.title('Passenger Class Count')
plt.show()
EDA : Titanic Dataset
# Count plot for 'Sex'
plt.figure(figsize=(8, 6))
sns.countplot(x='Sex', data=titanic)
plt.title('Gender Count')
plt.show()
# Survival rate by 'Sex'
plt.figure(figsize=(8, 6))
sns.barplot(x='Sex', y='Survived', data=titanic)
plt.title('Survival Rate by Gender')
plt.show()
# Survival rate by 'Pclass'
plt.figure(figsize=(8, 6))
sns.barplot(x='Pclass', y='Survived', data=titanic)
plt.title('Survival Rate by Class')
plt.show()
EDA : Titanic Dataset
# Correlation matrix
plt.figure(figsize=(12, 8))
corr_matrix = titanic.corr()
sns.heatmap(corr_matrix, annot=True,
cmap='coolwarm', linewidths=0.2)
plt.title('Correlation Matrix')
plt.show()
EDA on Haberman's Survival Dataset
● What is the dataset about ?
● The dataset consists of several columns what do they mean?
● Get basic information about the dataset, including the number
of rows and columns and a preview of the data.
● Check if there are any missing values in the dataset
● Get basic statistics for numerical columns.
● Find the counts for each category in Survival_Status column.
● Visualize the distribution of numerical columns using
histograms and boxplots.
● Analyze the distribution of the survival status.
● Check the correlation between the features.
● Perform feature scaling on columns wherever needed.
Categorical variables
● Nominal Data
○ categories with no inherent order or ranking.
○ Blood type (A, B, AB, O)

● Ordinal Data:
○ Represents categories with an inherent order
or ranking.
○ Education level (high school, bachelor's
degree, master's degree)
Feature encoding
● crucial step in the machine learning pipeline,
particularly when working with categorical data.

● ML Algorithm require require numerical input, so

categorical data must be converted into numerical
form.

● Label Encoding
● One-Hot Encoding
Label Encoding
● converts each unique category in a feature into an
integer value.
● useful for ordinal categorical variables where the
order of the categories is important.
● For example, the categories low, medium, and high
could be encoded as 0, 1, and 2, respectively.
● Can introduce ordinal relationships where none
exist, which can mislead some machine learning
models.
Label Encoding
import pandas as pd
from sklearn.preprocessing import LabelEncoder
data = {'color': ['red', 'blue', 'green', 'blue']}
df = pd.DataFrame(data)
label_encoder = LabelEncoder()
df['color_encoded'] =
label_encoder.ﬁt_transform(df['color'])
print(df)
One-Hot Encoding
● converts categorical variables into a set of binary
variables (one-hot vectors).
● Each category is represented by a binary vector
where only one element is 1 and the rest are 0.
● useful for nominal categorical variables where no
ordinal relationship exists.
● Can result in high-dimensional data if the
categorical feature has many unique values.
One-Hot Encoding
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

data = {'color': ['red', 'blue', 'green', 'blue']}

df = pd.DataFrame(data)
df_one_hot = pd.get_dummies(df, columns=['color'])
Feature Scaling
● process of normalizing or standardizing the range of
features in your dataset.

● prevents features with larger scales from

dominating the model's learning process.

● Min-Max Scaling (Normalization)

● Standardization (Z-score Normalization):

Min-Max Scaling (Normalization)
● Transforms features by scaling each feature to a
given range, usually [0, 1].
● X′=Xmax−Xmin/X−Xmin
● makes it easier to understand and interpret.
● Improved Performance for Distance-Based
Algorithms
● Sensitive to Outliers:
● Does Not Handle Variance
Min-Max Scaling (Normalization)
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
data = {'height': [150, 160, 170, 180, 190], 'weight':
[50, 60, 70, 80, 90]}
df = pd.DataFrame(data)
scaler = MinMaxScaler()
df_scaled = pd.DataFrame(scaler.ﬁt_transform(df),
columns=df.columns)
print(df_scaled)
Standardization ( z-score Normalization)
import pandas as pd
from sklearn.preprocessing import StandardScaler
data = {'height': [150, 160, 170, 180, 190], 'weight': [50,
60, 70, 80, 90]}
df = pd.DataFrame(data)
scaler = StandardScaler()
df_scaled = pd.DataFrame(scaler.ﬁt_transform(df),
columns=df.columns)
print(df_scaled)
Thank You

RAJAD SHAKYA

Unit 5 PythonPackages (Matplotlib)
No ratings yet
Unit 5 PythonPackages (Matplotlib)
24 pages
Answers 1
No ratings yet
Answers 1
17 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
DSC Lab Cycle Programs 2024 1-40
No ratings yet
DSC Lab Cycle Programs 2024 1-40
48 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
Fods Lab Manual
No ratings yet
Fods Lab Manual
26 pages
Dsa Lab Record (Ai&Ds)
No ratings yet
Dsa Lab Record (Ai&Ds)
34 pages
Lab-02 AI
No ratings yet
Lab-02 AI
14 pages
Sandeep ML Record
No ratings yet
Sandeep ML Record
31 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
Unit 3
No ratings yet
Unit 3
19 pages
FOD Record Sem 1
No ratings yet
FOD Record Sem 1
25 pages
Fundamentals of Data Science Lab Manual New1
100% (1)
Fundamentals of Data Science Lab Manual New1
32 pages
NumPy Is
No ratings yet
NumPy Is
8 pages
DSF Lab Exp Full
No ratings yet
DSF Lab Exp Full
88 pages
DSF Lab
No ratings yet
DSF Lab
14 pages
NumPy for Scientific Computing
No ratings yet
NumPy for Scientific Computing
39 pages
Unit 5
No ratings yet
Unit 5
28 pages
Machine Learning - Manual
No ratings yet
Machine Learning - Manual
32 pages
Section 7
No ratings yet
Section 7
33 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
Roadmap
No ratings yet
Roadmap
27 pages
Python Libraries
No ratings yet
Python Libraries
6 pages
Unit 5
No ratings yet
Unit 5
20 pages
Study Material For XII Computer Science On: Data Visualization Using Pyplot
No ratings yet
Study Material For XII Computer Science On: Data Visualization Using Pyplot
22 pages
Tutorial 2
No ratings yet
Tutorial 2
9 pages
Numpy Basics for Scientific Computing
No ratings yet
Numpy Basics for Scientific Computing
23 pages
Python Unit 4
No ratings yet
Python Unit 4
43 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Numpy Python
No ratings yet
Numpy Python
36 pages
IP Book 12 Question Bank
No ratings yet
IP Book 12 Question Bank
20 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
38 pages
MCP Lab-2023 ContentForPythonLibrariesTopic
No ratings yet
MCP Lab-2023 ContentForPythonLibrariesTopic
9 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
Lab Manual ML R22
No ratings yet
Lab Manual ML R22
27 pages
Numpy
No ratings yet
Numpy
4 pages
Practicals 1 To 4
No ratings yet
Practicals 1 To 4
15 pages
Numpy Cheet Sheet 6 Page
No ratings yet
Numpy Cheet Sheet 6 Page
6 pages
Module 6 NumPY and Pandas
No ratings yet
Module 6 NumPY and Pandas
12 pages
Python Libraries and NumPy Guide
No ratings yet
Python Libraries and NumPy Guide
90 pages
NumPy Basics and Operations Guide
No ratings yet
NumPy Basics and Operations Guide
53 pages
45B AIML Practical1.1
No ratings yet
45B AIML Practical1.1
57 pages
Exp 12 Pyt
No ratings yet
Exp 12 Pyt
7 pages
Unit Vi
No ratings yet
Unit Vi
60 pages
EX-02-Data Manipulation Pandas Matplot
No ratings yet
EX-02-Data Manipulation Pandas Matplot
9 pages
Unit 5
No ratings yet
Unit 5
39 pages
Numpy Features and Functions Guide
No ratings yet
Numpy Features and Functions Guide
7 pages
12 Numpy&Matplotlib
No ratings yet
12 Numpy&Matplotlib
48 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
NumPy Element-Wise Operations Guide
No ratings yet
NumPy Element-Wise Operations Guide
25 pages
Python Data Science Packages Guide
No ratings yet
Python Data Science Packages Guide
11 pages
DSF Lab Manual (OCS353T)
No ratings yet
DSF Lab Manual (OCS353T)
36 pages
Exp 9
No ratings yet
Exp 9
10 pages
15 Numpy
No ratings yet
15 Numpy
32 pages
Comprehensive Python Data Libraries Curriculum
No ratings yet
Comprehensive Python Data Libraries Curriculum
14 pages
UNIT II - Data Handling Part I
No ratings yet
UNIT II - Data Handling Part I
8 pages
NumPy Basics and Array Operations
No ratings yet
NumPy Basics and Array Operations
73 pages
Symmetric Encryption
No ratings yet
Symmetric Encryption
21 pages
ML5 Implementation
No ratings yet
ML5 Implementation
32 pages
Unit 15
No ratings yet
Unit 15
9 pages
Unit 17
No ratings yet
Unit 17
11 pages
Building Test Environment - Test Cases
No ratings yet
Building Test Environment - Test Cases
16 pages
Unit 13
No ratings yet
Unit 13
18 pages
Datasheet+of+DS-2DF8C425MHS-DEL V5.7.7 20220706
No ratings yet
Datasheet+of+DS-2DF8C425MHS-DEL V5.7.7 20220706
7 pages
Trace 20250511
No ratings yet
Trace 20250511
3 pages
Course Outline Programming Fundamentals
No ratings yet
Course Outline Programming Fundamentals
4 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
9 pages
Hand Gesture Controlled Robot
No ratings yet
Hand Gesture Controlled Robot
46 pages
DeFi Liquid Staking for Crypto Users
No ratings yet
DeFi Liquid Staking for Crypto Users
25 pages
Smart Device Installation Guide
No ratings yet
Smart Device Installation Guide
2 pages
Azure Data Engineer Course Curriculum Nareshit
No ratings yet
Azure Data Engineer Course Curriculum Nareshit
10 pages
Wireless Flex Sensor Robotic Arm
No ratings yet
Wireless Flex Sensor Robotic Arm
54 pages
Adv Unix Scripting
100% (2)
Adv Unix Scripting
139 pages
Foundations - The Odin Project
No ratings yet
Foundations - The Odin Project
6 pages
Pps 41a031 Etka Eng PDF
No ratings yet
Pps 41a031 Etka Eng PDF
78 pages
Aes Lab Manual
No ratings yet
Aes Lab Manual
43 pages
Firadisk Tutorialmio
No ratings yet
Firadisk Tutorialmio
1 page
Types of Networks
No ratings yet
Types of Networks
18 pages
The Telecom Sutras - Dedicated To Telecomwallahs of India
No ratings yet
The Telecom Sutras - Dedicated To Telecomwallahs of India
64 pages
T003a TMU Project Report Template v1.3
No ratings yet
T003a TMU Project Report Template v1.3
21 pages
R71CF-313 Users Manual 20240902
No ratings yet
R71CF-313 Users Manual 20240902
35 pages
SDDS3 Sync Failure Report
No ratings yet
SDDS3 Sync Failure Report
8 pages
System Backup and Data Management Procedures
No ratings yet
System Backup and Data Management Procedures
16 pages
COMPUTATIONAL PHYSICS Lecture 02 SHAHBAZ BHATTI
No ratings yet
COMPUTATIONAL PHYSICS Lecture 02 SHAHBAZ BHATTI
5 pages
VIPSTEALTH CC Guide V1.1
100% (2)
VIPSTEALTH CC Guide V1.1
16 pages
Excel Practical 20-31
No ratings yet
Excel Practical 20-31
3 pages
2025-02-24 78734710 Log
No ratings yet
2025-02-24 78734710 Log
10 pages
Digital Marketing Profile of Vipin Malkani
No ratings yet
Digital Marketing Profile of Vipin Malkani
3 pages
Arm Cortex A Series Slides
No ratings yet
Arm Cortex A Series Slides
29 pages
Milestone11 Samuel Terefe.2025
No ratings yet
Milestone11 Samuel Terefe.2025
10 pages
Unix Module 5
No ratings yet
Unix Module 5
30 pages
Chapter Four
No ratings yet
Chapter Four
22 pages
Alcohol Vapor Detector D.I.Y Kit: For Detecting Alcohol Vapor in Household or Workplace Environment Using The MQ3 Sensor
No ratings yet
Alcohol Vapor Detector D.I.Y Kit: For Detecting Alcohol Vapor in Household or Workplace Environment Using The MQ3 Sensor
9 pages

ML3 Data Analysis

Uploaded by

ML3 Data Analysis

Uploaded by

LO3 Develop a machine learning

application using an appropriate

● Data Cleaning (Handling Missing Values, Outlier

● Data Transformation (Normalization/Scaling,

● provides support for arrays, matrices, and a variety

● pip install numpy.

● Performance: Signiﬁcantly faster than lists for

● Efﬁcient use of memory because of homogeneous

# Creating arrays from lists

arr = np.array([[1, 2, 3], [4, 5, 6]])

print("Array Shape:", arr.shape)

reshaped = arr.reshape((3, 2))

arr1 = np.array([1, 2, 3])

print(np.sum(arr)) # Sum of all elements

sum_axis0 = np.sum(A, axis=0)

● useful for generating plots and charts

● pip install matplotlib

● plt.plot(x, y, marker='o') # Circle marker

● plt.plot(x, y, label='Prime Numbers')

(Assume any data range using np arange)

# First subplot with title and labels

● Line Width: linewidth

● Line Width: linewidth

# Create box plot

● offers two primary data structures:

● consists of multiple Series (columns) with

● List or dictionary based initialization

● # Accessing the ﬁrst row

● # Accessing multiple rows by label

● print(df.loc[:, ['Name', 'Score']])

● print(df.loc[1:3, ['Name', 'City']])

● print(df.iloc[1:3, [0, 2]])

● import seaborn as sns

# Removing the Outliers

● ML Algorithm require require numerical input, so

data = {'color': ['red', 'blue', 'green', 'blue']}

● prevents features with larger scales from

● Min-Max Scaling (Normalization)

● Standardization (Z-score Normalization):

You might also like