Lab Manual: Foundation of Data
Science
Authors: J. Daphney Joann, M. Balasubramaniam
Experiment 1: Introduction to Python for Data Science
Aim:
To understand and implement basic Python programming constructs used in data science.
Algorithm:
1. Start Python IDE or Jupyter Notebook.
2. Create a Python program with basic syntax: input, output, loops.
3. Define variables and perform operations.
4. Run the program and observe the output.
Code:
# Basic Python program for input, loop and output
name = input("Enter your name: ")
print("Welcome", name)
print("Looping from 0 to 4:")
for i in range(5):
print("Iteration", i)
Output:
Enter your name: Daphney
Welcome Daphney
Looping from 0 to 4:
Iteration 0
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Result:
The basic Python constructs such as input, loops, and print statements were successfully
executed.
Experiment 2: Data Structures in Python (List, Tuple, Dictionary)
Aim:
To learn and apply data structures like List, Tuple, and Dictionary in Python.
Algorithm:
5. Initialize a list, tuple, and dictionary with sample values.
6. Perform operations like accessing, slicing, and updating.
7. Print the results to understand behavior.
Code:
# List operations
fruits = ["apple", "banana", "cherry"]
print("Fruits list:", fruits)
fruits.append("orange")
print("Updated list:", fruits)
# Tuple operations
days = ("Mon", "Tue", "Wed")
print("Days tuple:", days)
# Dictionary operations
student = {"name": "John", "age": 21, "course": "Data Science"}
print("Student dictionary:", student)
print("Student Name:", student["name"])
Output:
Fruits list: ['apple', 'banana', 'cherry']
Updated list: ['apple', 'banana', 'cherry', 'orange']
Days tuple: ('Mon', 'Tue', 'Wed')
Student dictionary: {'name': 'John', 'age': 21, 'course': 'Data Science'}
Student Name: John
Result:
List, Tuple, and Dictionary were implemented successfully and their properties were
demonstrated.
Experiment 3: Numpy Basics: Arrays and Vectorized Computations
Aim:
To learn how to use Numpy for array operations and vectorized computations.
Algorithm:
8. Import numpy as np.
9. Create arrays using numpy.
10. Perform basic arithmetic and vectorized operations.
11. Print and interpret results.
Code:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print("Array a:", a)
print("Array b:", b)
print("Sum:", a + b)
print("Product:", a * b)
Output:
Array a: [1 2 3]
Array b: [4 5 6]
Sum: [5 7 9]
Product: [ 4 10 18]
Result:
Numpy arrays and basic vectorized operations were demonstrated successfully.
Experiment 4: Data Manipulation using Pandas
Aim:
To perform data manipulation using pandas DataFrame.
Algorithm:
12. Import pandas as pd.
13. Create a DataFrame.
14. Perform operations like adding, updating, and deleting data.
15. Display the results.
Code:
import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Age': [24, 27]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
df['Age'] = df['Age'] + 1
print("Updated DataFrame:")
print(df)
Output:
Original DataFrame:
Name Age
0 Alice 24
1 Bob 27
Updated DataFrame:
Name Age
0 Alice 25
1 Bob 28
Result:
Data was successfully manipulated using pandas DataFrame.
Experiment 5: Data Visualization using Matplotlib and Seaborn
Aim:
To visualize data using matplotlib and seaborn libraries.
Algorithm:
16. Import required libraries.
17. Prepare data for plotting.
18. Use matplotlib and seaborn to create graphs.
19. Display the plots.
Code:
import matplotlib.pyplot as plt
import seaborn as sns
data = [5, 10, 15, 20]
plt.plot(data)
plt.title("Line Plot")
plt.show()
Output:
Line plot is displayed showing the trend of values.
Result:
Data visualization using matplotlib was successfully implemented.
Experiment 6: Descriptive Statistics and Data Summary
Aim:
To compute summary statistics of a dataset.
Algorithm:
20. Import pandas.
21. Create a DataFrame with numerical values.
22. Use describe() to generate summary.
23. Print the result.
Code:
import pandas as pd
data = {'Score': [88, 92, 79, 93, 85]}
df = pd.DataFrame(data)
print(df.describe())
Output:
Score
count 5.000000
mean 87.400000
std 5.549775
min 79.000000
25% 85.000000
50% 88.000000
75% 92.000000
max 93.000000
Result:
Descriptive statistics were calculated successfully using pandas.
Experiment 7: Handling Missing Data and Data Cleaning
Aim:
To handle and clean missing data using pandas.
Algorithm:
24. Import pandas.
25. Create a DataFrame with missing values.
26. Use functions like fillna() and dropna().
27. Observe changes.
Code:
import pandas as pd
data = {'Name': ['Alice', 'Bob', None], 'Age': [25, None, 30]}
df = pd.DataFrame(data)
print("Original Data:")
print(df)
df_clean = df.fillna({'Name': 'Unknown', 'Age': df['Age'].mean()})
print("Cleaned Data:")
print(df_clean)
Output:
Original Data:
Name Age
0 Alice 25.0
1 Bob NaN
2 None 30.0
Cleaned Data:
Name Age
0 Alice 25.0
1 Bob 27.5
2 Unknown 30.0
Result:
Missing data was handled using fillna and replaced with default values.
Experiment 8: Grouping, Merging and Aggregation with Pandas
Aim:
To perform grouping and merging operations on data using pandas.
Algorithm:
28. Create two DataFrames.
29. Merge them using merge().
30. Group the merged data and aggregate.
31. Display results.
Code:
import pandas as pd
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'Score': [85, 90]})
merged = pd.merge(df1, df2, on='ID')
print("Merged DataFrame:")
print(merged)
grouped = merged.groupby('Name').mean()
print("Grouped by Name:")
print(grouped)
Output:
Merged DataFrame:
ID Name Score
0 1 Alice 85
1 2 Bob 90
Grouped by Name:
ID Score
Name
Alice 1.0 85.0
Bob 2.0 90.0
Result:
Grouping and merging of data was demonstrated successfully.
Experiment 9: Introduction to Data Preprocessing Techniques
Aim:
To apply preprocessing techniques like normalization and encoding.
Algorithm:
32. Create sample data.
33. Apply normalization using sklearn.
34. Apply encoding if necessary.
35. Print the results.
Code:
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
data = {'Marks': [50, 80, 100]}
df = pd.DataFrame(data)
scaler = MinMaxScaler()
df[['Marks']] = scaler.fit_transform(df[['Marks']])
print(df)
Output:
Marks
0 0.00
1 0.75
2 1.00
Result:
Data normalization was applied successfully using sklearn.
Experiment 10: Mini Project / Case Study on Exploratory Data Analysis
(EDA)
Aim:
To explore a dataset using EDA techniques and summarize insights.
Algorithm:
36. Load dataset using pandas.
37. Visualize data distributions.
38. Use describe(), info(), value_counts().
39. Document key insights.
Code:
import pandas as pd
df = pd.read_csv('sample.csv')
print(df.info())
print(df.describe())
print(df['Category'].value_counts())
Output:
Displays dataset info, summary statistics, and category distribution.
Result:
EDA was performed successfully and insights were derived.