0% found this document useful (0 votes)
379 views8 pages

Foundation of Data Science Lab Manual Full

The document is a lab manual for foundational data science experiments covering Python programming, data structures, Numpy, Pandas, data visualization, and exploratory data analysis. Each experiment includes aims, algorithms, code examples, outputs, and results demonstrating successful execution of various data science techniques. The manual serves as a practical guide for implementing essential data science skills.

Uploaded by

prithivipt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
379 views8 pages

Foundation of Data Science Lab Manual Full

The document is a lab manual for foundational data science experiments covering Python programming, data structures, Numpy, Pandas, data visualization, and exploratory data analysis. Each experiment includes aims, algorithms, code examples, outputs, and results demonstrating successful execution of various data science techniques. The manual serves as a practical guide for implementing essential data science skills.

Uploaded by

prithivipt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Lab Manual: Foundation of Data

Science
Authors: J. Daphney Joann, M. Balasubramaniam

Experiment 1: Introduction to Python for Data Science

Aim:
To understand and implement basic Python programming constructs used in data science.

Algorithm:
1. Start Python IDE or Jupyter Notebook.
2. Create a Python program with basic syntax: input, output, loops.
3. Define variables and perform operations.
4. Run the program and observe the output.

Code:
# Basic Python program for input, loop and output
name = input("Enter your name: ")
print("Welcome", name)

print("Looping from 0 to 4:")


for i in range(5):
print("Iteration", i)

Output:
Enter your name: Daphney
Welcome Daphney
Looping from 0 to 4:
Iteration 0
Iteration 1
Iteration 2
Iteration 3
Iteration 4

Result:
The basic Python constructs such as input, loops, and print statements were successfully
executed.
Experiment 2: Data Structures in Python (List, Tuple, Dictionary)

Aim:
To learn and apply data structures like List, Tuple, and Dictionary in Python.

Algorithm:
5. Initialize a list, tuple, and dictionary with sample values.
6. Perform operations like accessing, slicing, and updating.
7. Print the results to understand behavior.

Code:
# List operations
fruits = ["apple", "banana", "cherry"]
print("Fruits list:", fruits)
fruits.append("orange")
print("Updated list:", fruits)

# Tuple operations
days = ("Mon", "Tue", "Wed")
print("Days tuple:", days)

# Dictionary operations
student = {"name": "John", "age": 21, "course": "Data Science"}
print("Student dictionary:", student)
print("Student Name:", student["name"])

Output:
Fruits list: ['apple', 'banana', 'cherry']
Updated list: ['apple', 'banana', 'cherry', 'orange']
Days tuple: ('Mon', 'Tue', 'Wed')
Student dictionary: {'name': 'John', 'age': 21, 'course': 'Data Science'}
Student Name: John

Result:
List, Tuple, and Dictionary were implemented successfully and their properties were
demonstrated.

Experiment 3: Numpy Basics: Arrays and Vectorized Computations

Aim:
To learn how to use Numpy for array operations and vectorized computations.
Algorithm:
8. Import numpy as np.
9. Create arrays using numpy.
10. Perform basic arithmetic and vectorized operations.
11. Print and interpret results.

Code:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print("Array a:", a)
print("Array b:", b)
print("Sum:", a + b)
print("Product:", a * b)

Output:
Array a: [1 2 3]
Array b: [4 5 6]
Sum: [5 7 9]
Product: [ 4 10 18]

Result:
Numpy arrays and basic vectorized operations were demonstrated successfully.

Experiment 4: Data Manipulation using Pandas

Aim:
To perform data manipulation using pandas DataFrame.

Algorithm:
12. Import pandas as pd.
13. Create a DataFrame.
14. Perform operations like adding, updating, and deleting data.
15. Display the results.

Code:
import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Age': [24, 27]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
df['Age'] = df['Age'] + 1
print("Updated DataFrame:")
print(df)

Output:
Original DataFrame:
Name Age
0 Alice 24
1 Bob 27
Updated DataFrame:
Name Age
0 Alice 25
1 Bob 28

Result:
Data was successfully manipulated using pandas DataFrame.

Experiment 5: Data Visualization using Matplotlib and Seaborn

Aim:
To visualize data using matplotlib and seaborn libraries.

Algorithm:
16. Import required libraries.
17. Prepare data for plotting.
18. Use matplotlib and seaborn to create graphs.
19. Display the plots.

Code:
import matplotlib.pyplot as plt
import seaborn as sns

data = [5, 10, 15, 20]


plt.plot(data)
plt.title("Line Plot")
plt.show()

Output:
Line plot is displayed showing the trend of values.

Result:
Data visualization using matplotlib was successfully implemented.
Experiment 6: Descriptive Statistics and Data Summary

Aim:
To compute summary statistics of a dataset.

Algorithm:
20. Import pandas.
21. Create a DataFrame with numerical values.
22. Use describe() to generate summary.
23. Print the result.

Code:
import pandas as pd
data = {'Score': [88, 92, 79, 93, 85]}
df = pd.DataFrame(data)
print(df.describe())

Output:
Score
count 5.000000
mean 87.400000
std 5.549775
min 79.000000
25% 85.000000
50% 88.000000
75% 92.000000
max 93.000000

Result:
Descriptive statistics were calculated successfully using pandas.

Experiment 7: Handling Missing Data and Data Cleaning

Aim:
To handle and clean missing data using pandas.

Algorithm:
24. Import pandas.
25. Create a DataFrame with missing values.
26. Use functions like fillna() and dropna().
27. Observe changes.
Code:
import pandas as pd
data = {'Name': ['Alice', 'Bob', None], 'Age': [25, None, 30]}
df = pd.DataFrame(data)
print("Original Data:")
print(df)
df_clean = df.fillna({'Name': 'Unknown', 'Age': df['Age'].mean()})
print("Cleaned Data:")
print(df_clean)

Output:
Original Data:
Name Age
0 Alice 25.0
1 Bob NaN
2 None 30.0
Cleaned Data:
Name Age
0 Alice 25.0
1 Bob 27.5
2 Unknown 30.0

Result:
Missing data was handled using fillna and replaced with default values.

Experiment 8: Grouping, Merging and Aggregation with Pandas

Aim:
To perform grouping and merging operations on data using pandas.

Algorithm:
28. Create two DataFrames.
29. Merge them using merge().
30. Group the merged data and aggregate.
31. Display results.

Code:
import pandas as pd
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'Score': [85, 90]})
merged = pd.merge(df1, df2, on='ID')
print("Merged DataFrame:")
print(merged)
grouped = merged.groupby('Name').mean()
print("Grouped by Name:")
print(grouped)

Output:
Merged DataFrame:
ID Name Score
0 1 Alice 85
1 2 Bob 90
Grouped by Name:
ID Score
Name
Alice 1.0 85.0
Bob 2.0 90.0

Result:
Grouping and merging of data was demonstrated successfully.

Experiment 9: Introduction to Data Preprocessing Techniques

Aim:
To apply preprocessing techniques like normalization and encoding.

Algorithm:
32. Create sample data.
33. Apply normalization using sklearn.
34. Apply encoding if necessary.
35. Print the results.

Code:
from sklearn.preprocessing import MinMaxScaler
import pandas as pd

data = {'Marks': [50, 80, 100]}


df = pd.DataFrame(data)
scaler = MinMaxScaler()
df[['Marks']] = scaler.fit_transform(df[['Marks']])
print(df)

Output:
Marks
0 0.00
1 0.75
2 1.00

Result:
Data normalization was applied successfully using sklearn.

Experiment 10: Mini Project / Case Study on Exploratory Data Analysis


(EDA)

Aim:
To explore a dataset using EDA techniques and summarize insights.

Algorithm:
36. Load dataset using pandas.
37. Visualize data distributions.
38. Use describe(), info(), value_counts().
39. Document key insights.

Code:
import pandas as pd
df = pd.read_csv('sample.csv')
print(df.info())
print(df.describe())
print(df['Category'].value_counts())

Output:
Displays dataset info, summary statistics, and category distribution.

Result:
EDA was performed successfully and insights were derived.

You might also like