PUML4PL02 DATA ENGINEERING LABORATORY LT PC
0042
COURSE OBJECTIVE
To collect, preprocess, and clean data from diverse sources for analysis.
To analyze and optimize OS resource utilization and concurrency in data pipelines.
To design normalized database schemas and implement efficient queries.
To perform data analysis, modeling, and visualization using Python and BI tools.
To evaluate replication mechanisms, time series, and predictive models for data-driven decision making.
LIST OF EXERCISES
1. Data Collection from Various sources .
2. Data Preprocessing and Cleaning.
3. Analyzing OS Resource Management for Data Pipelines
(Monitor CPU, memory, and I/O utilization while processing
data).
4. Concurrency and parallel processing (process large dataset
using multiprocessing).
5. Create a normalized relational schema and insert sample data.
6. Optimize queries using indexes and analyze performance.
7. SQL Query Optimization And Execution Plan Analysis.
8. Time Series Analysis on Online Retail Dataset using Python.
9. Data Replication and Topology Simulation.
10. Regression on Student Dataset with Graphs.
11. Data Visualization on Pima Indian Diabetes Dataset.
12. E-Commerece Analytics Dashboard in power BI.
Total Periods: 60
COURSEOUTCOMES:
At the end of the course, Students will be able to
CO’S COURSE OUTCOMES COGNITIVE
LEVEL
CO1 Collect and preprocess data from multiple sources for efficient data analysis. Apply
Analyze OS-level resource utilization and implement concurrency for optimized data
CO2 Analyze
pipelines.
CO3 Design and implement normalized relational databases with optimized SQL queries. Apply
Perform statistical, regression, and time-series analysis using Python for real-world
CO4 Evaluate
datasets.
Develop interactive dashboards and visualize analytical insights using Power BI and
CO5 Create
Python.
CO – PO Mapping:
CO’s PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
CO1 _ _
3 2 2 3 2 1 – 2 3 2
- -
CO2 _ _
2 3 3 3 2 1 – 3 3 2
- -
CO3 _ _
3 3 3 3 2 1 – 2 3 3
- -
CO4 _ _
2 3 3 3 3 2 – 3 3 3
- -
CO5 _ _
2 3 3 3 3 2 – 3 3 3
- -
TEXTBOOK:
1. Fundamentals of Data Engineering"By Joe Reis and Matt Housley O’Reilly Media, 2022.
2. "Database System Concepts"By Abraham Silberschatz, Henry F. Korth, and S. Sudarshan
McGraw Hill Education, 7th Edition, 2020
REFERENCES:
1. Python for Data Analysis"By Wes McKinney O’Reilly Media, 3rd Edition, 2022.
2. Power BI Data Analytics and Visualization – Greg Deckler (Packt, 2021).
3. "Designing Data-Intensive Applications" By Martin Kleppmann O’Reilly Media, 2017.
WEBSITE REFERENCE / NPTEL/ SWAYAM/ MOOC REFERENCE:
[Link]
HARDWARE:
Fast processor (Intel i7/i9), 16GB RAM (32GB+), SSD (500GB–1TB) for speed, plus HDD for extra storage.
GPU (NVIDIA RTX 3060 or higher) for visuals and AI tasks. Stable internet for cloud access and data sharing.
SOFTWARES:
Windows 10/11 or Ubuntu 22.04 LTS
Python (Scrapy, OpenCV), Power BI desktop
Mysql
Git/ github desktop