0% found this document useful (0 votes)

13 views6 pages

Data Science Programs

Uploaded by

senthur kannan thirugnanasambanthan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views6 pages

Data Science Programs

Uploaded by

senthur kannan thirugnanasambanthan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Question: Create a Pandas program to read a CSV file, fill missing values with the column

mean, and group the data by a specified category to calculate the average of a numerical

column.

Answer:

import pandas as pd

# Read the CSV file into a DataFrame

file_path = '[Link]' # Replace with your CSV file path

data = pd.read_csv(file_path)

# Fill missing values in each column with the column mean

data = [Link]([Link](numeric_only=True))

# Specify the category column and numerical column

category_column = 'Category' # Replace with the name of your category column

numerical_column = 'Value' # Replace with the name of your numerical column

# Group the data by the category column and calculate the average of the numerical column

grouped_data = [Link](category_column)[numerical_column].mean()

# Display the results

print("Average of numerical column grouped by category:")

print(grouped_data)

Question: Implement a k-nearest neighbors (KNN) classifier using scikit-learn to predict

labels from the Iris dataset, and evaluate the model's accuracy.
Answer:

from [Link] import load_iris

from sklearn.model_selection import train_test_split

from [Link] import StandardScaler

from [Link] import KNeighborsClassifier

from [Link] import accuracy_score

# Load the Iris dataset

iris = load_iris()

X, y = [Link], [Link]

# Split the dataset into training and testing sets (80% train, 20% test)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features for better performance

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = [Link](X_test)

# Create the KNN classifier with k=3

knn = KNeighborsClassifier(n_neighbors=3)

# Train the classifier

[Link](X_train, y_train)

# Predict labels for the test set

y_pred = [Link](X_test)

# Evaluate the model's accuracy

accuracy = accuracy_score(y_test, y_pred)

# Display the accuracy

print("Accuracy of the KNN classifier:", accuracy)

Question: Write a Python program to load a CSV file into a Pandas DataFrame and display

summary statistics (mean, median, and mode) for numerical columns.

Answer:

import pandas as pd

# Load the CSV file into a DataFrame

file_path = '[Link]' # Replace with the path to your CSV file

data = pd.read_csv(file_path)

# Display the DataFrame

print("DataFrame:")

print(data)

# Calculate and display summary statistics for numerical columns

numerical_data = data.select_dtypes(include=['number'])

# Mean

mean_values = numerical_data.mean()
print("\nMean of numerical columns:")

print(mean_values)

# Median

median_values = numerical_data.median()

print("\nMedian of numerical columns:")

print(median_values)

# Mode

mode_values = numerical_data.mode()

print("\nMode of numerical columns:")

print(mode_values.iloc[0]) # Display the first mode for simplicity

Question: Write a Dask program to load a large CSV file, filter the data based on specific

criteria, and save the results to a new CSV file.

Answer:

import [Link] as dd

# Load the large CSV file into a Dask DataFrame

file_path = 'large_data.csv' # Replace with the path to your large CSV file

data = dd.read_csv(file_path)

# Define the filtering criteria (e.g., filter rows where 'column_name' > 50)

filtered_data = data[data['column_name'] > 50] # Replace 'column_name' and condition as needed

# Save the filtered data to a new CSV file

output_file_path = 'filtered_data.csv'

filtered_data.to_csv(output_file_path, single_file=True, index=False)

print(f"Filtered data has been saved to {output_file_path}")

Question: Write a Python function to calculate the mean, median, and mode of a given list of

numerical values.

Answer:

from statistics import mean, median, mode, StatisticsError

def calculate_statistics(numbers):

"""

Calculate the mean, median, and mode of a list of numerical values.

Args:

numbers (list): A list of numerical values.

Returns:

dict: A dictionary containing the mean, median, and mode.

"""

if not numbers:

return {"mean": None, "median": None, "mode": None}

try:

stats = {

"mean": mean(numbers),
"median": median(numbers),

"mode": mode(numbers),

except StatisticsError:

# Handle cases where mode is not defined (e.g., all values occur equally)

stats = {

"mean": mean(numbers),

"median": median(numbers),

"mode": "No unique mode",

return stats

# Example usage

numbers = [10, 20, 20, 30, 40]

result = calculate_statistics(numbers)

print("Mean:", result["mean"])

print("Median:", result["median"])

print("Mode:", result["mode"])

DS Manual 1
No ratings yet
DS Manual 1
96 pages
Experiment - 1 csd201
No ratings yet
Experiment - 1 csd201
19 pages
Data Analysis
No ratings yet
Data Analysis
20 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
Python Pandas II Notes XII
No ratings yet
Python Pandas II Notes XII
20 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Machine Learning Lab Word 12-1-2025. Document
No ratings yet
Machine Learning Lab Word 12-1-2025. Document
68 pages
Week2 Lab
No ratings yet
Week2 Lab
8 pages
ML Programs
No ratings yet
ML Programs
41 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
Aiml Exp 3.1 Mean Median
No ratings yet
Aiml Exp 3.1 Mean Median
2 pages
Python Data Analysis: Stats Computation
No ratings yet
Python Data Analysis: Stats Computation
2 pages
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
No ratings yet
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
9 pages
Python 1st 10
No ratings yet
Python 1st 10
11 pages
Pandas Descriptive Stats Guide
No ratings yet
Pandas Descriptive Stats Guide
55 pages
Python Data Analysis Basics
No ratings yet
Python Data Analysis Basics
32 pages
Data Handling Using Pandas-By Abhishek Shakya
No ratings yet
Data Handling Using Pandas-By Abhishek Shakya
55 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
DATA M EXAMS Programation 2
No ratings yet
DATA M EXAMS Programation 2
3 pages
Data Analysis W Pandas
No ratings yet
Data Analysis W Pandas
4 pages
Ss Project With Python
No ratings yet
Ss Project With Python
9 pages
EDA Lab Manual
100% (2)
EDA Lab Manual
93 pages
EDA Lab Manual
No ratings yet
EDA Lab Manual
93 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
Aiml Lab Manaual R23
100% (1)
Aiml Lab Manaual R23
10 pages
Python in Research
No ratings yet
Python in Research
18 pages
Chapter1.2 PythonPandas2
No ratings yet
Chapter1.2 PythonPandas2
38 pages
FDS Slips Solution
No ratings yet
FDS Slips Solution
7 pages
22011P0515 ML Assignment
No ratings yet
22011P0515 ML Assignment
4 pages
Document 1
No ratings yet
Document 1
16 pages
Ankit Python
No ratings yet
Ankit Python
26 pages
Descriptive Stats in Pandas DataFrame
No ratings yet
Descriptive Stats in Pandas DataFrame
17 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Python Libraries for Statistical Analysis
No ratings yet
Python Libraries for Statistical Analysis
40 pages
Practical (Data Science)
No ratings yet
Practical (Data Science)
13 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
2.DescriptiveAnalytics v2
No ratings yet
2.DescriptiveAnalytics v2
10 pages
A09Ass03 - Jupyter Notebook
No ratings yet
A09Ass03 - Jupyter Notebook
5 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Experiment No. 1
No ratings yet
Experiment No. 1
7 pages
Ai Programs
No ratings yet
Ai Programs
22 pages
Dsbda Ass3
No ratings yet
Dsbda Ass3
22 pages
Python Basics - Hamza Zahoor
No ratings yet
Python Basics - Hamza Zahoor
6 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
DataFrame Statistics
No ratings yet
DataFrame Statistics
41 pages
Machine Learning: Technical Requirements & Data Processing Guide
No ratings yet
Machine Learning: Technical Requirements & Data Processing Guide
30 pages
Pandas Data Handling & Visualization Guide
100% (1)
Pandas Data Handling & Visualization Guide
37 pages
01 Statistics With Python
No ratings yet
01 Statistics With Python
8 pages
Random Variable
No ratings yet
Random Variable
10 pages
Unit 1 Python Pandas
No ratings yet
Unit 1 Python Pandas
20 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
26 pages
Dsa Lab Record (Ai&Ds)
No ratings yet
Dsa Lab Record (Ai&Ds)
34 pages
ML Lab
No ratings yet
ML Lab
14 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
ML with Python: Data Visualization Guide
No ratings yet
ML with Python: Data Visualization Guide
7 pages
ML Lab Manual
No ratings yet
ML Lab Manual
27 pages
DAV Practicle File
No ratings yet
DAV Practicle File
28 pages
Underwater Sensor (Presentation)
No ratings yet
Underwater Sensor (Presentation)
18 pages
Developer Timesheet Summary
No ratings yet
Developer Timesheet Summary
39 pages
El 4
No ratings yet
El 4
35 pages
Shivam's CV
No ratings yet
Shivam's CV
2 pages
Impact of Information Technology (IT) Investment On Banks' Performance: A Study On Dhaka Stock Exchange (DSE) Listed Banks of Bangladesh
No ratings yet
Impact of Information Technology (IT) Investment On Banks' Performance: A Study On Dhaka Stock Exchange (DSE) Listed Banks of Bangladesh
5 pages
Convolutional Layer Output Size Formula
No ratings yet
Convolutional Layer Output Size Formula
10 pages
Phedcgrc - in Status ViewRTNDetails - Aspx Ticket 202504170036
No ratings yet
Phedcgrc - in Status ViewRTNDetails - Aspx Ticket 202504170036
1 page
Fenner Handbook 2011
No ratings yet
Fenner Handbook 2011
55 pages
MeshMixer Tutorial For 3D Printing Beginners - All3DP
50% (4)
MeshMixer Tutorial For 3D Printing Beginners - All3DP
15 pages
Manual Gas Analizer Siemens
No ratings yet
Manual Gas Analizer Siemens
244 pages
Vocal Remover - MP3 Vocal Remover - Vocal Eliminator - Backing Trac
No ratings yet
Vocal Remover - MP3 Vocal Remover - Vocal Eliminator - Backing Trac
7 pages
Bonus Report: Brought To You by Jestine Yong
50% (4)
Bonus Report: Brought To You by Jestine Yong
0 pages
Business Intelligence and Logistics: White Paper
No ratings yet
Business Intelligence and Logistics: White Paper
12 pages
Probability Final Exam
No ratings yet
Probability Final Exam
2 pages
TCS Organizational Structure Analysis
No ratings yet
TCS Organizational Structure Analysis
5 pages
Payroll
No ratings yet
Payroll
49 pages
2016 Manufacturing SupplyChain Logistics TransportationManagement Trends PDF
No ratings yet
2016 Manufacturing SupplyChain Logistics TransportationManagement Trends PDF
65 pages
AISD Cisco Unity Voicemail Guide
No ratings yet
AISD Cisco Unity Voicemail Guide
1 page
16-Point Checklist For Building Production-Ready Kubernetes Clusters
No ratings yet
16-Point Checklist For Building Production-Ready Kubernetes Clusters
13 pages
Lesson Exemplar E7Q4W5
100% (3)
Lesson Exemplar E7Q4W5
9 pages
Suggested Questions Crypto2
No ratings yet
Suggested Questions Crypto2
5 pages
Elastomer Uses in Modern Industries
No ratings yet
Elastomer Uses in Modern Industries
12 pages
Dip Unit-I
No ratings yet
Dip Unit-I
14 pages
Welcome To The Spyglass Physical Version L-2016.06. This Document Describes The
No ratings yet
Welcome To The Spyglass Physical Version L-2016.06. This Document Describes The
8 pages
Sadhisha Launch Brochure
No ratings yet
Sadhisha Launch Brochure
11 pages
Android Dental Clinic Scheduler
No ratings yet
Android Dental Clinic Scheduler
13 pages
Greenhouse Wireless Monitoring..
0% (1)
Greenhouse Wireless Monitoring..
88 pages
A Step by Step Guide For Invoicing Extraction (FI-... - SAP Community
No ratings yet
A Step by Step Guide For Invoicing Extraction (FI-... - SAP Community
22 pages
GSM Multi-Sector Solution Guide (GSM BSS Draft A)
100% (1)
GSM Multi-Sector Solution Guide (GSM BSS Draft A)
21 pages
DNS Gratis Volphz
No ratings yet
DNS Gratis Volphz
21 pages

Data Science Programs

Uploaded by

Data Science Programs

Uploaded by

Question: Create a Pandas program to read a CSV file, fill missing values with the column

# Read the CSV file into a DataFrame

file_path = '[Link]' # Replace with your CSV file path

# Fill missing values in each column with the column mean

# Specify the category column and numerical column

category_column = 'Category' # Replace with the name of your category column

numerical_column = 'Value' # Replace with the name of your numerical column

# Display the results

print("Average of numerical column grouped by category:")

Question: Implement a k-nearest neighbors (KNN) classifier using scikit-learn to predict

from [Link] import load_iris

from sklearn.model_selection import train_test_split

from [Link] import StandardScaler

from [Link] import KNeighborsClassifier

from [Link] import accuracy_score

# Load the Iris dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features for better performance

# Create the KNN classifier with k=3

# Train the classifier

# Predict labels for the test set

# Evaluate the model's accuracy

accuracy = accuracy_score(y_test, y_pred)

# Display the accuracy

print("Accuracy of the KNN classifier:", accuracy)

summary statistics (mean, median, and mode) for numerical columns.

# Load the CSV file into a DataFrame

file_path = '[Link]' # Replace with the path to your CSV file

# Display the DataFrame

# Calculate and display summary statistics for numerical columns

print("\nMedian of numerical columns:")

print("\nMode of numerical columns:")

print(mode_values.iloc[0]) # Display the first mode for simplicity

criteria, and save the results to a new CSV file.

# Load the large CSV file into a Dask DataFrame

filtered_data = data[data['column_name'] > 50] # Replace 'column_name' and condition as needed

# Save the filtered data to a new CSV file

filtered_data.to_csv(output_file_path, single_file=True, index=False)

print(f"Filtered data has been saved to {output_file_path}")

from statistics import mean, median, mode, StatisticsError

Calculate the mean, median, and mode of a list of numerical values.

numbers (list): A list of numerical values.

dict: A dictionary containing the mean, median, and mode.

return {"mean": None, "median": None, "mode": None}

"mode": "No unique mode",

numbers = [10, 20, 20, 30, 40]

You might also like