0% found this document useful (0 votes)

30 views27 pages

ML Lab Manual

The document provides a comprehensive overview of Python programming for statistical analysis, focusing on central tendency measures (mean, median, mode) and measures of dispersion (variance, standard deviation) using built-in libraries such as statistics, math, NumPy, and SciPy. It includes example code, explanations of key functions, and answers to common questions regarding these libraries and their applications in data science and machine learning. Additionally, it introduces the Pandas library for data manipulation and preprocessing.

Uploaded by

Manikanta Koyalagudem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views27 pages

ML Lab Manual

Uploaded by

Manikanta Koyalagudem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Program-1.

Write a python program to compute Central Tendency Measures: Mean, Median, Mode Measure of
Dispersion: Variance, Standard Deviation

Ans:
OUTPUT:

Source Code Editable:

import statistics
def compute_measures(data):
mean_value = statistics.mean(data)
median_value = statistics.median(data)
mode_value = statistics.mode(data)
variance_value = statistics.variance(data)
std_deviation_value = statistics.stdev(data)

print(f"Mean: {mean_value}")
print(f"Median: {median_value}")
print(f"Mode: {mode_value}")
print(f"Variance: {variance_value}")
print(f"Standard Deviation: {std_deviation_value}")

# Example dataset
data = [10, 20, 20, 30, 40, 50, 50, 50, 60, 70]
compute_measures(data)

OUTPUT:
Mean: 40
Median: 45.0
Mode: 50
Variance: 377.77777777777777
Standard Deviation: 19.436506316151
VIVA VOICE QUESTION & ANSWERS:

1. What are the central tendency measures, and why are they important?

Answer:
Central tendency measures indicate the center or typical value of a dataset. The three main measures are:

 Mean: The arithmetic average of the dataset.

 Median: The middle value when data is sorted.
 Mode: The most frequently occurring value.
These measures help summarize data and make comparisons across datasets.

2. What are variance and standard deviation, and how do they differ?

Answer:
Both variance and standard deviation measure how spread out the data is.

 Variance is the average squared deviation from the mean.

 Standard Deviation is the square root of the variance, representing dispersion in the same unit as the
data.
Standard deviation is more commonly used because it is easier to interpret.

3. How do you calculate the mean, median, and mode in Python?

Answer:
Python’s statistics module provides built-in functions:

import statistics

data = [10, 20, 30, 40, 50, 50]

mean_value = statistics.mean(data) # Computes Mean

median_value = statistics.median(data) # Computes Median
mode_value = statistics.mode(data) # Computes Mode

print(mean_value, median_value, mode_value)

4. What happens if there is more than one mode in the dataset?

Answer:
The statistics.mode() function returns a single mode. If multiple modes exist, it raises an error. Instead, use
statistics.multimode() to return all modes.
Example:

data = [10, 20, 20, 30, 30, 40]

modes = statistics.multimode(data) # Returns [20, 30]

5. Why do we use (N-1) instead of N for sample variance?

Answer:
For sample variance, we divide by N−1instead of N to correct for bias in estimating the population variance.
This is called Bessel's correction, and it ensures that the sample variance is an unbiased estimator of the
population variance.

For
population variance, we divide by N.

6. How do you calculate variance and standard deviation in Python?

Answer:
Using the statistics module:

import statistics

data = [10, 20, 30, 40, 50]

variance_value = statistics.variance(data) # Computes Variance
std_dev_value = statistics.stdev(data) # Computes Standard Deviation

print(variance_value, std_dev_value)

variance() returns the squared deviation from the mean, while stdev() returns its square root.

7. How can you visualize central tendency and dispersion in Python?

Answer:
We can use Matplotlib or Seaborn to visualize data distributions.
Example:

import matplotlib.pyplot as plt

import seaborn as sns

data = [10, 20, 30, 40, 50, 50, 60, 70]

sns.histplot(data, kde=True)
plt.axvline(statistics.mean(data), color='red', label="Mean")
plt.axvline(statistics.median(data), color='blue', linestyle="dashed", label="Median")
plt.legend()
plt.show()

This histogram shows the mean (red) and median (blue dashed) over the data distribution.
Program-2. Study of Python Basic Libraries such as Statistics, Math, Numpy and Scipy

Study of Python Basic Libraries: Statistics, Math, NumPy, and SciPy

Python provides several built-in and external libraries for mathematical and statistical computations. In this
detailed study, we will explore four key libraries:

1. Statistics Module (statistics)

2. Math Module (math)
3. NumPy Library (numpy)
4. SciPy Library (scipy)

1. Statistics Module (statistics)

The statistics module is a part of Python’s standard library and provides functions for basic statistical
analysis.

Key Functions and Their Usage

Function Description Example

mean(data) Returns the arithmetic mean (average) statistics.mean([1, 2, 3]) → 2.0

median(data) Returns the middle value of sorted data statistics.median([1, 2, 3, 4]) → 2.5

mode(data) Returns the most frequent value statistics.mode([1, 1, 2, 3]) → 1

variance(data) Returns the sample variance statistics.variance([1, 2, 3]) → 1.0

stdev(data) Returns the sample standard deviation statistics.stdev([1, 2, 3]) → 1.0

Example Code:
import statistics as stats

data = [10, 20, 30, 40, 50, 50]

print("Mean:", stats.mean(data)) # 33.33

print("Median:", stats.median(data)) # 35.0
print("Mode:", stats.mode(data)) # 50
print("Variance:", stats.variance(data)) # 266.67
print("Standard Deviation:", stats.stdev(data)) # 16.33

2. Math Module (math)

The math module provides mathematical functions like logarithms, trigonometry, and factorials.

Key Functions and Their Usage

Function Description Example

sqrt(x) Returns the square root of x math.sqrt(25) → 5.0

factorial(x) Returns x! (factorial of x) math.factorial(5) → 120

log(x, base) Returns the logarithm of x to the given base math.log(8, 2) → 3.0

sin(x), cos(x), math.sin(math.radians(90)) →

tan(x)
Trigonometric functions (x in radians) 1.0

gcd(a, b)
Returns the greatest common divisor of a and math.gcd(48, 18) → 6
b

Example Code:
import math

print("Square Root of 25:", math.sqrt(25)) # 5.0

print("Factorial of 5:", math.factorial(5)) # 120
print("Logarithm base 10 of 100:", math.log10(100)) # 2.0
print("Sine of 90 degrees:", math.sin(math.radians(90))) # 1.0
print("GCD of 48 and 18:", math.gcd(48, 18)) # 6

3. NumPy Library (numpy)

NumPy is a powerful library for numerical computations, especially for handling large arrays and matrices
efficiently.

Key Functions and Their Usage

Function Description Example

np.array([elements]) Creates an array np.array([1, 2, 3])

np.mean(arr) Computes the mean np.mean([1, 2, 3]) → 2.0

np.median(arr) Computes the median np.median([1, 2, 3]) → 2.0

np.std(arr) Computes the standard deviation np.std([1, 2, 3]) → 0.816

np.var(arr) Computes the variance np.var([1, 2, 3]) → 0.667

np.percentile(arr, q) Computes the q-th percentile np.percentile([1, 2, 3], 50) → 2.0

Example Code:
import numpy as np

arr = np.array([10, 20, 30, 40, 50])

print("Mean:", np.mean(arr)) # 30.0

print("Median:", np.median(arr)) # 30.0
print("Standard Deviation:", np.std(arr)) # 14.14
print("Variance:", np.var(arr)) # 200.0
print("25th Percentile:", np.percentile(arr, 25)) # 20.0

4. SciPy Library (scipy)

SciPy builds on NumPy and provides additional scientific computing tools, including statistics, optimization,
and linear algebra.

Key Functions and Their Usage

Function Description Example

stats.mode(data) Computes the mode stats.mode([1, 2, 2, 3]) → Mode: 2

stats.iqr(data) Computes the interquartile range stats.iqr([1, 2, 3, 4, 5]) → 2.0

stats.zscore(data) Computes the Z-score stats.zscore([10, 20, 30])

scipy.linalg.det(matrix) Computes the determinant of a matrix det([[1,2],[3,4]])

Example Code:
from scipy import stats
import numpy as np

data = np.array([10, 20, 30, 40, 50, 50])

print("Mode:", stats.mode(data)) # Mode: 50

print("Interquartile Range:", stats.iqr(data)) # 20.0
print("Z-scores:", stats.zscore(data)) # [-1.22, -0.61, 0.0, 0.61, 1.22, 1.22]

Summary Table
Library Purpose Example Functions

statistics Basic statistical analysis mean(), median(), mode(), variance()

math Mathematical operations sqrt(), factorial(), log(), sin()

numpy Numerical computations, array operations mean(), std(), percentile()

scipy Advanced scientific computing stats.mode(), stats.iqr(), zscore()

Each of these libraries plays a crucial role in data science, engineering, and scientific computing.

https://chatgpt.com/share/67b4bda2-621c-8001-8a02-ec82f170170a
VIVA VOICE QUESTIONS AND ANSWERS

1. What is the difference between the math module and the statistics module in Python?

Answer:

 The math module provides basic mathematical functions such as logarithms, trigonometry, and power
functions.
 The statistics module is specifically used for statistical computations like mean, median, mode, variance,
and standard deviation.
 Example:
 import math
 print(math.sqrt(16)) # 4.0

 import statistics
 print(statistics.mean([1, 2, 3, 4, 5])) # 3.0

2. What are the advantages of using NumPy over Python lists for numerical computations?

Answer:

 Speed: NumPy arrays are faster than lists due to efficient memory storage.
 Memory Efficiency: NumPy arrays consume less memory compared to Python lists.
 Vectorized Operations: NumPy performs operations on entire arrays without using loops.
 Built-in Functions: Supports advanced mathematical operations like linear algebra and Fourier transforms.

Example:

import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr * 2) # Vectorized operation: [2 4 6 8]

3. How do you calculate the mean and standard deviation using NumPy?

Answer:

 Use numpy.mean() for mean and numpy.std() for standard deviation.

import numpy as np
data = np.array([10, 20, 30, 40, 50])
print("Mean:", np.mean(data)) # 30.0
print("Standard Deviation:", np.std(data)) # 14.14

4. What is SciPy, and how does it differ from NumPy?

Answer:

 SciPy is built on top of NumPy and provides additional scientific computing functionalities like integration,
optimization, and interpolation.
 NumPy focuses on efficient array operations, whereas SciPy extends NumPy to handle more advanced
mathematical problems.

Example:

from scipy import linalg

import numpy as np
A = np.array([[1, 2], [3, 4]])
print(linalg.inv(A)) # Inverse of matrix A

5. How do you generate random numbers using NumPy?

Answer:

Use numpy.random module to generate random numbers.

import numpy as np
print(np.random.random()) # Random float between 0 and 1
print(np.random.randint(1, 10)) # Random integer between 1 and 9

6. What is the difference between mode() in statistics and SciPy’s stats.mode()?

Answer:

 statistics.mode() works on a simple list and returns the most common element.
 scipy.stats.mode() works efficiently on large datasets and multidimensional arrays.

Example:

import statistics
import scipy.stats as stats
data = [1, 2, 2, 3, 3, 3, 4]
print(statistics.mode(data)) # Output: 3

import numpy as np
arr = np.array([1, 2, 2, 3, 3, 3, 4])
print(stats.mode(arr)) # ModeResult(mode=array([3]), count=array([3]))

7. How do you perform integration using SciPy?

Answer:

Use scipy.integrate.quad() to integrate a function.

from scipy.integrate import quad

import numpy as np

def f(x):
return np.sin(x)

result, _ = quad(f, 0, np.pi) # Integrates sin(x) from 0 to π

print(result) # Output: 2.0
8. How can you compute the determinant of a matrix using NumPy and SciPy?

Answer:

Use numpy.linalg.det() or scipy.linalg.det().

import numpy as np
from scipy import linalg

A = np.array([[3, 2], [1, 4]])

print(np.linalg.det(A)) # Using NumPy
print(linalg.det(A)) # Using SciPy

9. How do you find the roots of a quadratic equation using SciPy?

Answer:

Use numpy.roots() or scipy.optimize to find the roots.

import numpy as np
coefficients = [1, -3, 2] # x² - 3x + 2 = 0
print(np.roots(coefficients)) # Output: [2. 1.]

10. How can you calculate a normal distribution PDF using SciPy?

Answer:

Use scipy.stats.norm.pdf() for the probability density function.

import scipy.stats as stats

print(stats.norm.pdf(0, loc=0, scale=1)) # PDF at x=0 for standard normal distribution

These questions cover both conceptual and practical knowledge, ensuring a strong grasp of Statistics, Math,
NumPy, and SciPy in Python. Let me know if you need more advanced viva questions! 😊
Program 3: Study of Python Libraries for ML application such as Pandas and Matplotlib

Study of Pandas - Python Library for Machine Learning Applications

Introduction

Pandas is a powerful and widely used Python library for data manipulation, preprocessing, and analysis in
Machine Learning (ML). It provides flexible data structures such as Series and DataFrame, which make
handling large datasets easier.

Why Use Pandas for ML?

✅ Efficient Data Handling: Supports large datasets.

✅ Data Cleaning & Preprocessing: Handles missing values, duplicates, and filtering.
✅ Integration: Works well with NumPy, Matplotlib, Scikit-learn.
✅ Feature Engineering: Aggregation, transformation, and encoding.
✅ Data Input/Output: Reads and writes from CSV, Excel, SQL, JSON.

Installing Pandas
pip install pandas

Pandas Data Structures

1. Series - One-Dimensional Data Structure

A Series is like a column in a table or an array with labeled indexes.

Example: Creating a Pandas Series

import pandas as pd

data = [10, 20, 30, 40]

series = pd.Series(data, index=['A', 'B', 'C', 'D'])
print(series)

Output:

A 10
B 20
C 30
D 40
dtype: int64

2. DataFrame - Two-Dimensional Data Structure

A DataFrame is like a spreadsheet with rows and columns.

Example: Creating a Pandas DataFrame

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000, 60000, 70000]}

df = pd.DataFrame(data)
print(df)

Output:

Name Age Salary

0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000

Basic Operations in Pandas for ML

1. Reading and Writing Data

 Read CSV File

df = pd.read_csv('data.csv')

 Write DataFrame to CSV

df.to_csv('output.csv', index=False)

2. Data Selection and Filtering

 Select a Column

df['Name']

 Select Multiple Columns

df[['Name', 'Age']]

 Filter Rows Based on Condition

df[df['Age'] > 28]

3. Handling Missing Data

 Remove missing values

df.dropna()

 Fill missing values with a default value

df.fillna(0)
4. Grouping and Aggregation

 Group by Column and Compute Mean

df.groupby('Age')['Salary'].mean()

5. Merging and Joining DataFrames

 Merge Two DataFrames

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['A', 'B', 'C']})

df2 = pd.DataFrame({'ID': [1, 2, 3], 'Salary': [1000, 2000, 3000]})

merged_df = pd.merge(df1, df2, on='ID')

print(merged_df)

Pandas in Machine Learning Applications

1. Exploratory Data Analysis (EDA)

EDA is the first step in data preprocessing for ML models.

 Histograms help understand the distribution of features.

 Scatter plots help detect correlations.

Example:

import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv('sample_data.csv')

# Histogram of a feature
plt.hist(df['Feature1'], bins=20, color='blue', edgecolor='black')
plt.title("Feature Distribution")
plt.xlabel("Feature1")
plt.ylabel("Count")
plt.show()

2. Feature Engineering with Pandas

Feature engineering involves creating new features or modifying existing ones for ML models.

Example: Creating New Features

df['Age_Group'] = df['Age'].apply(lambda x: 'Young' if x < 30 else 'Old')
print(df.head())
3. Encoding Categorical Data

ML models require numerical data, so categorical features must be encoded.

Example: One-Hot Encoding

df = pd.get_dummies(df, columns=['Gender'], drop_first=True)

4. Normalization & Scaling

Feature scaling ensures ML models perform optimally.

Example: Min-Max Scaling

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df[['Salary']] = scaler.fit_transform(df[['Salary']])

5. Train-Test Split for ML Models

Splitting the dataset into training and testing sets is essential for ML models.

Example: Splitting Data

from sklearn.model_selection import train_test_split

X = df[['Age', 'Salary']]
y = df['Target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Pandas Integration with Scikit-learn

Pandas is often used with Scikit-learn for building ML models.

Example: Training a Simple ML Model

from sklearn.linear_model import LogisticRegression

# Create Model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predictions
predictions = model.predict(X_test)

Conclusion

 Pandas is an essential library for data preprocessing, analysis, and visualization in ML applications.
 It helps in cleaning, transforming, and preparing data for ML models.
 Works seamlessly with Scikit-learn, Matplotlib, and NumPy to build ML pipelines.
Study of Matplotlib - Python Library for Machine Learning Applications
Introduction

Matplotlib is a powerful data visualization library in Python. It is widely used in Machine Learning (ML),
Data Science, and Exploratory Data Analysis (EDA) to create a variety of graphs, charts, and plots.

Key Features of Matplotlib

✔️ Supports various types of plots: Line, Bar, Scatter, Histogram, Pie, etc.
✔️ Highly customizable: Colors, labels, styles, grids, and legends.
✔️ Supports interactive and animated visualizations.
✔️ Works well with NumPy, Pandas, and Seaborn.
✔️ Can generate plots for Jupyter Notebooks and GUI applications.

Installing Matplotlib
pip install matplotlib

Basic Structure of Matplotlib

Matplotlib mainly consists of the following components:

 Figure: The entire plotting area.

 Axes: The individual plots inside the figure.
 Plot elements: Lines, bars, markers, etc.

Example of Basic Matplotlib Usage:

import matplotlib.pyplot as plt

# Creating Data
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 50]

# Plotting the Data

plt.plot(x, y, marker='o', linestyle='--', color='b', label="Line Plot")

# Adding Labels and Title

plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Basic Line Plot")
plt.legend()

# Display the Plot

plt.show()
Types of Plots in Matplotlib

1. Line Plot (Trend Analysis)

Used for time series data and trends in Machine Learning.

import numpy as np

x = np.linspace(0, 10, 100)

y = np.sin(x)

plt.plot(x, y, color="r", linestyle="-")

plt.title("Sine Wave")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid(True)
plt.show()

2. Scatter Plot (Correlation Analysis)

Used in ML for visualizing relationships between two variables.

import numpy as np

# Generating random data

x = np.random.rand(50)
y = np.random.rand(50)

plt.scatter(x, y, color='g', marker='o')

plt.title("Scatter Plot Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

3. Bar Chart (Category Comparison)

Used in classification problems to compare categories.

categories = ['A', 'B', 'C', 'D']

values = [10, 20, 30, 40]

plt.bar(categories, values, color=['red', 'blue', 'green', 'purple'])

plt.title("Bar Chart Example")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.show()

4. Histogram (Data Distribution)

Used for feature distribution analysis in ML.

data = np.random.randn(1000)

plt.hist(data, bins=30, color='blue', edgecolor='black')

plt.title("Histogram Example")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

5. Pie Chart (Category Proportion)

Used in categorical data analysis.

labels = ['Class A', 'Class B', 'Class C', 'Class D']

sizes = [30, 20, 40, 10]
colors = ['gold', 'lightcoral', 'lightskyblue', 'lightgreen']

plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=140)

plt.title("Pie Chart Example")
plt.show()

6. Subplots (Multiple Plots in One Figure)

Used to visualize multiple variables in ML models.

fig, ax = plt.subplots(2, 2, figsize=(8, 6))

# Line Plot
ax[0, 0].plot(x, y, 'r')
ax[0, 0].set_title("Line Plot")

# Scatter Plot
ax[0, 1].scatter(x, y, color='g')
ax[0, 1].set_title("Scatter Plot")

# Bar Chart
ax[1, 0].bar(categories, values, color='b')
ax[1, 0].set_title("Bar Chart")

# Histogram
ax[1, 1].hist(data, bins=20, color='purple')
ax[1, 1].set_title("Histogram")

plt.tight_layout()
plt.show()

Matplotlib in Machine Learning Applications

1. Exploratory Data Analysis (EDA)

EDA is the first step in data preprocessing for ML models.

 Histograms help understand the distribution of features.

 Scatter plots help detect correlations.

Example:

import pandas as pd
# Load dataset
df = pd.read_csv('sample_data.csv')

# Histogram of a feature
plt.hist(df['Feature1'], bins=20, color='blue', edgecolor='black')
plt.title("Feature Distribution")
plt.xlabel("Feature1")
plt.ylabel("Count")
plt.show()

2. Model Performance Visualization

Plotting loss and accuracy during ML model training.

epochs = [1, 2, 3, 4, 5]
train_loss = [0.9, 0.7, 0.5, 0.3, 0.2]
val_loss = [1.0, 0.8, 0.6, 0.4, 0.3]

plt.plot(epochs, train_loss, 'r', label="Train Loss")

plt.plot(epochs, val_loss, 'b', label="Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.title("Training vs Validation Loss")
plt.legend()
plt.show()

Conclusion

 Matplotlib is an essential library for visualizing data in ML applications.

 It helps in data preprocessing, EDA, and model performance evaluation.
 Works seamlessly with Pandas, NumPy, and Seaborn for better ML workflows.

Program 4: Write a Python program to implement Simple Linear Regression

Regression

Regression is a statistical technique used in machine learning to model and analyze the relationship between
dependent (target) and independent (predictor) variables. The primary objective of regression is to predict a
continuous outcome based on input features.

There are different types of regression techniques, but the most common one is Linear Regression.
import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

# Generate synthetic data

np.random.seed(42)

X = 2 * np.random.rand(100, 1) # Feature variable

Y = 4 + 3 * X + np.random.randn(100, 1) # Target variable with some noise

# Split data into training and testing sets

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Create and train the linear regression model

model = LinearRegression()

model.fit(X_train, Y_train)
# Make predictions

Y_pred = model.predict(X_test)

# Model evaluation

mse = mean_squared_error(Y_test, Y_pred)

r2 = r2_score(Y_test, Y_pred)

print(f"Coefficients: {model.coef_[0][0]}")

print(f"Intercept: {model.intercept_[0]}")

print(f"Mean Squared Error: {mse}")

print(f"R^2 Score: {r2}")

# Plot results

plt.scatter(X_test, Y_test, color='blue', label='Actual data')

plt.plot(X_test, Y_pred, color='red', linewidth=2, label='Regression Line')

plt.xlabel("X")

plt.ylabel("Y")

plt.legend()

plt.title("Simple Linear Regression")

plt.show()
Program 5: Implementation of Multiple Linear Regression for House Price Prediction using sklearn
Multiple Linear Regression:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

# Generate synthetic data

np.random.seed(42)

X1 = 2 * np.random.rand(100, 1) # First feature

X2 = 3 * np.random.rand(100, 1) # Second feature

Y = 4 + 3 * X1 + 2 * X2 + np.random.randn(100, 1) # Target variable with noise

# Combine features into a single matrix

X = np.hstack((X1, X2))
# Split data into training and testing sets

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Create and train the multiple linear regression model

model = LinearRegression()

model.fit(X_train, Y_train)

# Make predictions

Y_pred = model.predict(X_test)

# Model evaluation

mse = mean_squared_error(Y_test, Y_pred)

r2 = r2_score(Y_test, Y_pred)

print(f"Coefficients: {model.coef_[0]}")

print(f"Intercept: {model.intercept_[0]}")

print(f"Mean Squared Error: {mse}")

print(f"R^2 Score: {r2}")

# Plot actual vs predicted values

plt.scatter(Y_test, Y_pred, color='blue', label='Predicted vs Actual')

plt.plot([min(Y_test), max(Y_test)], [min(Y_test), max(Y_test)], color='red', linestyle='--', label='Perfect Fit')

plt.xlabel("Actual Values")

plt.ylabel("Predicted Values")

plt.legend()

plt.title("Multiple Linear Regression - Actual vs Predicted")

plt.show()
Study of Python Library for ML application such as Pandas

Study of Python Library for ML application such as Matplotlib

Pandas - Python Library for Data Manipulation and Analysis

Introduction to Pandas:

Pandas is an open-source Python library used for data manipulation, analysis, and preprocessing. It provides
fast, flexible, and powerful data structures like Series and DataFrame to work with structured data efficiently.
Pandas is widely used in data science, machine learning, financial analysis, and big data processing.

Key Features of Pandas

1. Data Structures:
o Series: A one-dimensional labeled array.
o DataFrame: A two-dimensional, tabular data structure (like a spreadsheet).
o Panel (Deprecated): Used for handling three-dimensional data.
2. Data Handling:
o Read and write from CSV, Excel, JSON, SQL, HTML, and more.
o Load large datasets and perform fast operations.
3. Data Cleaning and Transformation:
o Handle missing values (dropna(), fillna()).
o Remove duplicates (drop_duplicates()).
o Convert data types (astype()).
4. Filtering and Indexing:
o Select rows and columns using labels (loc[]) or positions (iloc[]).
o Apply boolean conditions for filtering.
5. Aggregation and Grouping:
o Use groupby() for grouping data and computing aggregate statistics.
o Perform pivot table operations (pivot_table()).
6. Data Visualization:
o Built-in support for plotting graphs using Matplotlib (df.plot()).
7. Time Series Analysis:
o Handle and manipulate datetime objects.
o Perform resampling, shifting, and rolling window calculations.

Pandas Data Structures

1. Series - One-Dimensional Data Structure

A Series is similar to a list or an array, but with labels (index).

Example: Creating a Pandas Series

import pandas as pd

data = [10, 20, 30, 40]

series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(series)

Output:

a 10
b 20
c 30
d 40
dtype: int64

2. DataFrame - Two-Dimensional Data Structure

A DataFrame is a table-like structure with rows and columns, similar to an Excel spreadsheet.

Example: Creating a Pandas DataFrame

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],
'Salary': [50000, 60000, 70000]}

df = pd.DataFrame(data)
print(df)

Output:

Name Age Salary

0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000
Basic Operations in Pandas

1. Reading and Writing Data

 Read CSV File

df = pd.read_csv('data.csv')

 Write DataFrame to CSV

df.to_csv('output.csv', index=False)

2. Data Selection and Filtering

 Select a Column

df['Name']

 Select Multiple Columns

df[['Name', 'Age']]

 Filter Rows Based on Condition

df[df['Age'] > 28]

3. Handling Missing Data

 Remove missing values

df.dropna()

 Fill missing values with a default value

df.fillna(0)

4. Grouping and Aggregation

 Group by Column and Compute Mean

df.groupby('Age')['Salary'].mean()

5. Merging and Joining DataFrames

 Merge Two DataFrames

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['A', 'B', 'C']})

df2 = pd.DataFrame({'ID': [1, 2, 3], 'Salary': [1000, 2000, 3000]})
merged_df = pd.merge(df1, df2, on='ID')
print(merged_df)
Data Visualization in Pandas

Pandas provides built-in support for visualization using Matplotlib.

Example: Plotting a Line Chart

import matplotlib.pyplot as plt

df.plot(x='Age', y='Salary', kind='line')

plt.show()
Conclusion

 Pandas is a crucial library for data preprocessing, analysis, and visualization.

 It helps in handling structured data efficiently for machine learning and data science applications.
 Its integration with NumPy, Matplotlib, and Scikit-learn makes it a preferred choice for data analysis and ML
pipelines.

ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
Python Code for Central Tendency
No ratings yet
Python Code for Central Tendency
28 pages
Machine Learning Lab Word 12-1-2025. Document
No ratings yet
Machine Learning Lab Word 12-1-2025. Document
68 pages
ML Programs
No ratings yet
ML Programs
41 pages
ML Lab Manual
No ratings yet
ML Lab Manual
37 pages
Intro to Statistics with Python
No ratings yet
Intro to Statistics with Python
54 pages
Data Mining Lab Maual Through Python 031023
No ratings yet
Data Mining Lab Maual Through Python 031023
22 pages
01 Statistics With Python
No ratings yet
01 Statistics With Python
8 pages
Random Variable
No ratings yet
Random Variable
10 pages
DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
Nummerical Summaries
No ratings yet
Nummerical Summaries
11 pages
Data Mining and Predictive Modelling Assignment
No ratings yet
Data Mining and Predictive Modelling Assignment
34 pages
Rahul ML File' (1) 2
No ratings yet
Rahul ML File' (1) 2
30 pages
r22 1 9 ML Lab Manual r22 Regulations
No ratings yet
r22 1 9 ML Lab Manual r22 Regulations
24 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
69 pages
Introduction to Machine Learning Basics
100% (1)
Introduction to Machine Learning Basics
52 pages
6.lab Activity
No ratings yet
6.lab Activity
23 pages
Numpy in Python
No ratings yet
Numpy in Python
7 pages
DA Practical Lab 02 Statistical Functions
No ratings yet
DA Practical Lab 02 Statistical Functions
6 pages
ML Lab Manual
No ratings yet
ML Lab Manual
21 pages
Sandeep ML Record
No ratings yet
Sandeep ML Record
31 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
71 pages
Notebook Statistics
No ratings yet
Notebook Statistics
6 pages
Lab Manual ML R22
No ratings yet
Lab Manual ML R22
27 pages
ML Experiment - 1
No ratings yet
ML Experiment - 1
1 page
Machine Learning
No ratings yet
Machine Learning
3 pages
Machine Learning
No ratings yet
Machine Learning
27 pages
EDA: Key Stats & Visualizations in Python
No ratings yet
EDA: Key Stats & Visualizations in Python
15 pages
Parc 6
No ratings yet
Parc 6
3 pages
Session 3
No ratings yet
Session 3
61 pages
ML Manual New
No ratings yet
ML Manual New
38 pages
4-Demonstrate The Descriptive Statistics For A Sample Data Like Mean, Median, Variance and Correlation Etc.,-16-12-2024
No ratings yet
4-Demonstrate The Descriptive Statistics For A Sample Data Like Mean, Median, Variance and Correlation Etc.,-16-12-2024
10 pages
ML Exp-2 22
No ratings yet
ML Exp-2 22
18 pages
4 Compressed
No ratings yet
4 Compressed
18 pages
ML Lab - Manual
No ratings yet
ML Lab - Manual
15 pages
Stats Lab (4-6)
No ratings yet
Stats Lab (4-6)
7 pages
Principles of AI Laboratory Varshadr
No ratings yet
Principles of AI Laboratory Varshadr
54 pages
ML Lab Mala Reddy CLG
No ratings yet
ML Lab Mala Reddy CLG
23 pages
Python Code - Summary Statistics
No ratings yet
Python Code - Summary Statistics
6 pages
Machine Learning Implementation Guide
No ratings yet
Machine Learning Implementation Guide
7 pages
Statistics Module
No ratings yet
Statistics Module
2 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Python Statisc
No ratings yet
Python Statisc
7 pages
Statistical Measures in Data Analysis
No ratings yet
Statistical Measures in Data Analysis
70 pages
Staff Manual 03
No ratings yet
Staff Manual 03
3 pages
Data Analysis
No ratings yet
Data Analysis
20 pages
Smec ML Lab Manual R22
No ratings yet
Smec ML Lab Manual R22
21 pages
Data Analysis and Visualization EDA
No ratings yet
Data Analysis and Visualization EDA
51 pages
Lab Plan 5: Statistics and Probability: Describing A Single Set of Data
No ratings yet
Lab Plan 5: Statistics and Probability: Describing A Single Set of Data
19 pages
ML Lab Manual With Statistical Formulas
No ratings yet
ML Lab Manual With Statistical Formulas
9 pages
DAL Lab Manual
No ratings yet
DAL Lab Manual
46 pages
Matplot Lib Practicals
No ratings yet
Matplot Lib Practicals
24 pages
5 - Data Summaries and Visualization
No ratings yet
5 - Data Summaries and Visualization
87 pages
3 Data Visualization
No ratings yet
3 Data Visualization
75 pages
5 - Data Summaries and Visualization
No ratings yet
5 - Data Summaries and Visualization
97 pages
Aiml Exp 3.1 Mean Median
No ratings yet
Aiml Exp 3.1 Mean Median
2 pages
Project Report Writing Guidelines
No ratings yet
Project Report Writing Guidelines
31 pages
Technical Document Analysis
100% (2)
Technical Document Analysis
79 pages
Gr. 11 Psychology Chap. 2 - Methods of Psychology Notes 2023-24
No ratings yet
Gr. 11 Psychology Chap. 2 - Methods of Psychology Notes 2023-24
20 pages
Errors Experimental
No ratings yet
Errors Experimental
10 pages
Solartronics January Financial Review
100% (2)
Solartronics January Financial Review
5 pages
Financial Goals and Saving Habits of Senior High Students
No ratings yet
Financial Goals and Saving Habits of Senior High Students
35 pages
Group 1 (Purchase Intention of Luxury Product)
No ratings yet
Group 1 (Purchase Intention of Luxury Product)
23 pages
Confidence Intervals in Statistics
No ratings yet
Confidence Intervals in Statistics
49 pages
Multiple Discriminant Analysis Explained
No ratings yet
Multiple Discriminant Analysis Explained
21 pages
Linear Regression Slides
No ratings yet
Linear Regression Slides
129 pages
Standard Costing for Managers
No ratings yet
Standard Costing for Managers
7 pages
LNMI MBA Syllabus 2020 Final
No ratings yet
LNMI MBA Syllabus 2020 Final
157 pages
Koen 2012
No ratings yet
Koen 2012
37 pages
Metallurgical Belt Samplers For Crushers Linear Samplers Rotary Vezin Secondary Tertiary Samplers
No ratings yet
Metallurgical Belt Samplers For Crushers Linear Samplers Rotary Vezin Secondary Tertiary Samplers
26 pages
Essoham Ali
No ratings yet
Essoham Ali
27 pages
Discrete Random Variables Recitation Guide
No ratings yet
Discrete Random Variables Recitation Guide
5 pages
IGNOU MBA MS-95 Solved Assignment Dec 2012
No ratings yet
IGNOU MBA MS-95 Solved Assignment Dec 2012
14 pages
Sampling Distribution in Statistics Module
No ratings yet
Sampling Distribution in Statistics Module
17 pages
Introduction To Statistics Using R
No ratings yet
Introduction To Statistics Using R
237 pages
Random Variables and Processes Overview
No ratings yet
Random Variables and Processes Overview
90 pages
QMB Exam 2 Review
No ratings yet
QMB Exam 2 Review
7 pages
Kordes - BA - BMS Internal External
No ratings yet
Kordes - BA - BMS Internal External
9 pages
Homework 5
No ratings yet
Homework 5
3 pages
CDS Candidate Guide 2020
No ratings yet
CDS Candidate Guide 2020
21 pages
DASS-21 Validity in Adults
No ratings yet
DASS-21 Validity in Adults
14 pages
BestPracticesforyourCFA v5
No ratings yet
BestPracticesforyourCFA v5
44 pages
Probability For Quants
No ratings yet
Probability For Quants
8 pages
Cancer Worry Scale in Breast Cancer
No ratings yet
Cancer Worry Scale in Breast Cancer
9 pages
BE Unit 2
No ratings yet
BE Unit 2
14 pages
Continuous Random Variables Guide
No ratings yet
Continuous Random Variables Guide
27 pages
Cost Variance Analysis Guide
No ratings yet
Cost Variance Analysis Guide
39 pages