0% found this document useful (0 votes)

40 views4 pages

SML Practicals

The document outlines a program for analyzing a payroll dataset to calculate measures of central tendency and dispersion, including mean, median, mode, and standard deviation. It also discusses the importance of statistical inference in machine learning, emphasizing descriptive and inferential statistics. Additionally, the document presents a logistic regression model for credit card fraud detection, detailing data preparation, model training, and evaluation metrics such as confusion matrix and ROC-AUC score.

Uploaded by

baip1066

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views4 pages

SML Practicals

Uploaded by

baip1066

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 4

310503: Statistics and Machine Learning

For a payroll dataset create Measure of central tenancy and its measure of dispersion for
statistical analysis of given data.

Program:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from google.colab import drive

# Mount Google Drive

drive.mount('/content/drive')

# Load the dataset

data = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/csv/payroll
dataset.csv')

# Descriptive Statistics
# Measures of Central Tendency
mean_salary = data['Salary'].mean()
median_salary = data['Salary'].median()
mode_salary = data['Salary'].mode()[0]
mid_range_salary = (data['Salary'].max() + data['Salary'].min()) / 2

# Measures of Dispersion
range_salary = data['Salary'].max() - data['Salary'].min()
variance_salary = data['Salary'].var()
mean_deviation_salary = np.mean(np.abs(data['Salary'] - mean_salary))
std_deviation_salary = data['Salary'].std()

# Tabulate the results

results = {
'Measure': ['Mean', 'Median', 'Mode', 'Mid-Range', 'Range',
'Variance', 'Mean Deviation', 'Standard Deviation'],
'Value': [mean_salary, median_salary, mode_salary,
mid_range_salary, range_salary, variance_salary, mean_deviation_salary,
std_deviation_salary]
}

results_df = pd.DataFrame(results)
print("Descriptive Statistics:")
print(results_df)

# Plot the Salary distribution

plt.figure(figsize=(10, 6))
plt.hist(data['Salary'], bins=20, color='blue', alpha=0.7,
edgecolor='black')
plt.axvline(mean_salary, color='red', linestyle='dashed', linewidth=1,
label=f'Mean: {mean_salary:.2f}')
plt.axvline(median_salary, color='green', linestyle='dashed',
linewidth=1, label=f'Median: {median_salary:.2f}')
plt.axvline(mode_salary, color='purple', linestyle='dashed',
linewidth=1, label=f'Mode: {mode_salary:.2f}')
plt.xlabel('Salary')
plt.ylabel('Frequency')
plt.title('Salary Distribution with Central Tendency Measures')
plt.legend()
plt.show()

# Boxplot to visualize dispersion

plt.figure(figsize=(8, 6))
plt.boxplot(data['Salary'], vert=False, patch_artist=True,
boxprops=dict(facecolor='lightblue'))
plt.title('Boxplot of Salary (Measures of Dispersion)')
plt.xlabel('Salary')
plt.show()

# Importance of Statistical Inference in Machine Learning

print("\nImportance of Statistical Inference in Machine Learning:")
print("""
1. **Descriptive Statistics**: Helps in understanding the data
distribution and summarizing the main features.
2. **Inferential Statistics**: Allows making predictions or inferences
about a population based on sample data.
3. **Model Evaluation**: Statistical inference is crucial for
evaluating model performance, understanding uncertainty, and making
data-driven decisions.
4. **Hypothesis Testing**: Used to validate assumptions and test the
significance of features in machine learning models.
5. **Confidence Intervals**: Provide a range of values within which the
true population parameter is expected to lie.
""")

Descriptive Statistics:
Measure Value
0 Mean 2.059147e+06
1 Median 2.500000e+05
2 Mode 2.500000e+04
3 Mid-Range 5.001500e+06
4 Range 9.997000e+06
5 Variance 1.004968e+13
6 Mean Deviation 2.610499e+06
7 Standard Deviation 3.170124e+06
Create a probabilistic model for credit card fraud detection

Program:

# Import necessary libraries

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix,
roc_auc_score, roc_curve
import matplotlib.pyplot as plt

from google.colab import drive

drive.mount('/content/drive')

# Load the dataset

data = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/csv/Credit
Card Fraud Detection.csv')

# Display the first few rows of the dataset

print(data.head())

# Check for missing values

print("\nMissing values in the dataset:")
print(data.isnull().sum())

# Dataset information
print("\nDataset information:")
print(data.info())

# Class distribution (fraud vs non-fraud)

print("\nClass distribution:")
print(data['Class'].value_counts())

# Separate features (X) and target (y)

X = data.drop('Class', axis=1)
y = data['Class']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42, stratify=y)

# Standardize the features (important for Logistic Regression)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Train a Logistic Regression model
model = LogisticRegression(random_state=42, max_iter=1000)
model.fit(X_train, y_train)

# Predict probabilities on the test set

y_pred_prob = model.predict_proba(X_test)[:, 1] # Probability of fraud
(Class 1)

# Predict classes on the test set

y_pred = model.predict(X_test)

# Evaluate the model

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# ROC-AUC Score
roc_auc = roc_auc_score(y_test, y_pred_prob)
print(f"\nROC-AUC Score: {roc_auc:.4f}")

# Plot ROC Curve

fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, label=f'ROC Curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve for Fraud Detection')
plt.legend()
plt.show()

# Example: Predict fraud probability for a new transaction

new_transaction = np.array([[0, -1.359807, -0.072781, 2.536347,
1.378155, -0.338321, 0.462388, 0.239599, 0.098698, 0.363787, 0.090794,
-0.551600, -0.617801, -0.991390, -0.311169, 1.468177, -0.470401,
0.207971, 0.025791, 0.403993, 0.251412, -0.018307, 0.277838, -0.110474,
0.066928, 0.128539, -0.189115, 0.133558, -0.021053, 149.62]])
new_transaction_scaled = scaler.transform(new_transaction)
fraud_probability = model.predict_proba(new_transaction_scaled)[0, 1]
print(f"\nFraud Probability for New Transaction:
{fraud_probability:.4f}")

ANN, KNN & Decision Tree
No ratings yet
ANN, KNN & Decision Tree
13 pages
Credit Card Fraud Detection Methods
100% (1)
Credit Card Fraud Detection Methods
20 pages
Introduction of Phase 4
No ratings yet
Introduction of Phase 4
14 pages
Ads Phase4
No ratings yet
Ads Phase4
5 pages
Case Study Stock Market Prediciton
No ratings yet
Case Study Stock Market Prediciton
10 pages
Online Payment Fraud Detection ML
No ratings yet
Online Payment Fraud Detection ML
40 pages
PCA - Colab
No ratings yet
PCA - Colab
2 pages
Project 2
No ratings yet
Project 2
5 pages
Credit Card Fraud Detection Project - Complete Wor
No ratings yet
Credit Card Fraud Detection Project - Complete Wor
13 pages
Module 3.4 Classification Models, Case Study
No ratings yet
Module 3.4 Classification Models, Case Study
12 pages
MLP Regressor with Sklearn on Wine Data
No ratings yet
MLP Regressor with Sklearn on Wine Data
10 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
EP4130 Project
No ratings yet
EP4130 Project
17 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Fraud Detection for ML Engineers
No ratings yet
Fraud Detection for ML Engineers
15 pages
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
No ratings yet
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
17 pages
Fraud Detection with Machine Learning
No ratings yet
Fraud Detection with Machine Learning
8 pages
Disaster
No ratings yet
Disaster
20 pages
Project Report
No ratings yet
Project Report
34 pages
EDA and Modeling for Insurance Data
No ratings yet
EDA and Modeling for Insurance Data
11 pages
Yolo-NAS Predictions for Fraud Detection
No ratings yet
Yolo-NAS Predictions for Fraud Detection
25 pages
Ankit Python
No ratings yet
Ankit Python
26 pages
Final Report
No ratings yet
Final Report
17 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
4 pages
Engineering Students' Fraud Detection Project
No ratings yet
Engineering Students' Fraud Detection Project
61 pages
Fraud 2
No ratings yet
Fraud 2
20 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Ai 28-01-25
No ratings yet
Ai 28-01-25
18 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
PAL Codes
No ratings yet
PAL Codes
18 pages
Predicting Credit Card Approvals
100% (1)
Predicting Credit Card Approvals
14 pages
Credit Card Fraud Detection (Data Analyst)
No ratings yet
Credit Card Fraud Detection (Data Analyst)
22 pages
Phase 3
No ratings yet
Phase 3
19 pages
Data Analysis with Pandas & Matplotlib
No ratings yet
Data Analysis with Pandas & Matplotlib
3 pages
Credit Card Default Prediction PRESENTATION
No ratings yet
Credit Card Default Prediction PRESENTATION
12 pages
Dsbda 5
No ratings yet
Dsbda 5
4 pages
ML Updated File
No ratings yet
ML Updated File
36 pages
ML Lab File Final
No ratings yet
ML Lab File Final
17 pages
27 KrishParasShah
No ratings yet
27 KrishParasShah
17 pages
ML Surya
No ratings yet
ML Surya
19 pages
Documenting The Solution To Develop A Behaviour Score
No ratings yet
Documenting The Solution To Develop A Behaviour Score
9 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
r22 ML Lab Manual Final
No ratings yet
r22 ML Lab Manual Final
51 pages
Code Shabab Error 7
No ratings yet
Code Shabab Error 7
5 pages
05 E RandomForest LoanData
No ratings yet
05 E RandomForest LoanData
8 pages
Section 4 Group 12
No ratings yet
Section 4 Group 12
12 pages
Fraud Detection Model Analysis
100% (1)
Fraud Detection Model Analysis
14 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
Python 1
No ratings yet
Python 1
3 pages
Credit Card Fraud Analysis Ashutosh
No ratings yet
Credit Card Fraud Analysis Ashutosh
3 pages
ML Assignment 1
No ratings yet
ML Assignment 1
15 pages
DA Programs
No ratings yet
DA Programs
44 pages
ML with Python: Data Visualization Guide
No ratings yet
ML with Python: Data Visualization Guide
7 pages
Census Income Data Analysis Guide
No ratings yet
Census Income Data Analysis Guide
22 pages
Complete Data Science Questions
No ratings yet
Complete Data Science Questions
5 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
Q & A-Unit 2 - Distributions
No ratings yet
Q & A-Unit 2 - Distributions
23 pages
Q & A Unit 3 - Clustering Methods
No ratings yet
Q & A Unit 3 - Clustering Methods
21 pages
Unit 1 - Introduction To Statistics - Notes
No ratings yet
Unit 1 - Introduction To Statistics - Notes
32 pages
Students - Nptel Exam Form Fill Up
No ratings yet
Students - Nptel Exam Form Fill Up
9 pages
Machine Learning in Fraud Detection
No ratings yet
Machine Learning in Fraud Detection
12 pages
S&ML Unit 6 - Q & A
No ratings yet
S&ML Unit 6 - Q & A
12 pages
SW Mock 2024 - O AND A LEVEL GENERAL TIMETABLE FINAL
No ratings yet
SW Mock 2024 - O AND A LEVEL GENERAL TIMETABLE FINAL
1 page
Lesson 3. Reflecting On The Role of Literature in Shaping Societal Values - 20250507 - 195447 - 0000
No ratings yet
Lesson 3. Reflecting On The Role of Literature in Shaping Societal Values - 20250507 - 195447 - 0000
11 pages
2025 - Year 12 Subject Requirement List
No ratings yet
2025 - Year 12 Subject Requirement List
6 pages
Sustainability Scorecard 2016
No ratings yet
Sustainability Scorecard 2016
30 pages
SSC Chemistry Notes 5th Chapter Chemical Bonds
No ratings yet
SSC Chemistry Notes 5th Chapter Chemical Bonds
43 pages
Effects of A Personalized Game On Students Outcomes and Visual Attention During Digital Citizenship Learning-1
No ratings yet
Effects of A Personalized Game On Students Outcomes and Visual Attention During Digital Citizenship Learning-1
23 pages
Dynamic Load Management for EV Charging
No ratings yet
Dynamic Load Management for EV Charging
2 pages
Reactor 20
No ratings yet
Reactor 20
1 page
Sports Lessons and Their Impact on Students
No ratings yet
Sports Lessons and Their Impact on Students
4 pages
Dirk + Lodewyk Fourie Flight Tickets Brazil
No ratings yet
Dirk + Lodewyk Fourie Flight Tickets Brazil
3 pages
RPF Constable Answer Key 12 Mar 25 Shift I
No ratings yet
RPF Constable Answer Key 12 Mar 25 Shift I
21 pages
Chemistry Form 3 Term 1
No ratings yet
Chemistry Form 3 Term 1
10 pages
Mass and Heat Transfer: EGR 363 Spring 2009
No ratings yet
Mass and Heat Transfer: EGR 363 Spring 2009
2 pages
Thesis 2.1 Custom PHP
100% (3)
Thesis 2.1 Custom PHP
6 pages
Chemistry All in One Formula Sheet
No ratings yet
Chemistry All in One Formula Sheet
329 pages
Propane Safety Sheet
No ratings yet
Propane Safety Sheet
4 pages
Figures of Speech
No ratings yet
Figures of Speech
3 pages
A Study On Online Shopping Behaviour of Students With Special Reference To Aluva Taluk
No ratings yet
A Study On Online Shopping Behaviour of Students With Special Reference To Aluva Taluk
48 pages
Docker N8N Setup on Ubuntu Guide
No ratings yet
Docker N8N Setup on Ubuntu Guide
4 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
56 pages
GROUP 10-Identifying Factors of Work Stress
No ratings yet
GROUP 10-Identifying Factors of Work Stress
16 pages
Ms. Jai Quotation
No ratings yet
Ms. Jai Quotation
2 pages
AC Generator Simulation Using FEMM and Lua
No ratings yet
AC Generator Simulation Using FEMM and Lua
6 pages
Nestlé Nigeria Q1 2022 Financial Report
No ratings yet
Nestlé Nigeria Q1 2022 Financial Report
27 pages
Omron Programming Manual
No ratings yet
Omron Programming Manual
1,175 pages
Perdev 1ST Quarter
No ratings yet
Perdev 1ST Quarter
9 pages
Using A Scanner Generator: Lex: R.V.S College of Engineering, Dindigul
No ratings yet
Using A Scanner Generator: Lex: R.V.S College of Engineering, Dindigul
3 pages
Unit Test 1A
No ratings yet
Unit Test 1A
3 pages
Mechanical Sensors and Measurement Systems
No ratings yet
Mechanical Sensors and Measurement Systems
77 pages
System-Based Attacks
No ratings yet
System-Based Attacks
13 pages