PLAKSHA UNIVERSITY, MONSOON SEMESTER AY 2024-25
Course Code: FM 217
Course Title: Introduction to Data Science
Course Credits: 3 L / T / P: 2/0/1
____________________________________________________________________________________________
________
Course intended for: Freshmore III Sem
Prerequisites:
Class Schedule: WF 9:00 AM - 9:50 AM, 11:00 AM - 11:50 AM
Classroom No. 1001 / 1002,
Lab Schedule: MTWF 2:00 PM - 3:50 PM
Lab room number: 2102
________________________________________________________________________________________
___________________
Lead Instructor: Mayank Ratan Bhardwaj , Assistant Professor
Office No. A1204
Email ID:
[email protected]Co-Instructor(s): Prof. Anish Chowdhury, Prof. Sebastian
Office No.
Email ID:
[email protected]Teaching Fellows: Multiple
Course Description
This course offers an introduction to the area of Data Science, combining scientific
methods, visualization, statistics, and computing to extract meaningful insights from
data. The course will introduce the students to data in various forms, the strategies
used for collecting data, and techniques to visualize the data for exploratory analysis.
The students will get hands-on practice to develop intuition for forming hypotheses and
testing them using the available data or designing strategies for collecting appropriate
data. They will also learn techniques for fitting data for extracting more complex
relations between data attributes.
Course Overview
Data today is ubiquitous and is growing at an exponential rate. We are collecting data
from various sources in different modes and formats. Collecting and making sense of
this data could be a herculean task if not done scientifically. This course covers the
modalities and formats of data, as well as the tools employed in managing, exploring,
and analyzing data. The course starts with a gentle introduction to some tools in the
Python programming language that are commonly used for data analysis. Hands-on
exposure will be provided for loading data and obtaining data summaries. This will be
followed by an exposure to some plotting techniques and data visualization for
exploratory analysis. We shall then discuss scientific methods for obtaining data and
critiquing common approaches for data collection. This will be followed by a discussion
of developing hypotheses, designing strategies to collect data to test them, and using
statistical methods for making conclusions from the data. We will then learn regression
methods to analyze relationships between data attributes and fit functions that express
such relationships mathematically. We shall finish by illustrating how some of these
techniques are used in machine learning.
Learning Outcomes:
After completing this course, students should be able to
1. Learn the basic vocabulary of data science.
2. Understand data formats and statistical methods.
3. Visualize data and make inferences.
4. Interpret data relationships and results of statistical analysis.
5. Evaluate the pros and cons of various data formats, critique various methods for
collecting data and making conclusions from data.
6. Develop sequence of data visualization and analysis steps to gain insights from
real life data.
Recommended Textbook:
An Introduction to Statistical Learning with Applications in Python - James, Witten, Hastie,
Tibshirani and Taylor [Springer].
Assessments and Grading: [All Freshmore courses will have relative grading]
Note: Courses not having an exam will have an end semester jury for a
maximum total of 40%.
Exams: 30%
o Final Comprehensive Exam (30%): Programming (50%), Theory (50%)
Date: Dec 14 (tentative)
Lab Participation: 5%
Attendance: 10% Mandate for all courses
Matrix:
More than 80% = 10 marks (full)
70 to 80% = 8 marks
60 to 70% = 6 marks
Less than 60% = 0 marks
Quizzes: 40% (Best 2 out of 3)
o Quiz 1: End of week 3 (20%) [31 Aug]
o Quiz 2: End of week 6 (20%) [21 Sep]
o Quiz 3: End of week 10 (20%) [19 Oct]
Assignments: 15%
o Assignment 1: (Basic Data Analysis using Python) End of week 5 (5%)
o Assignment 2: (Statistical Inferences on Real World Data) End of week 10
(5%)
o Assignment 3: (Regression/Classification of data) End of week 15 (5%)
Weekly Class Plan
Assignments
Date Topics Due date
Introduction to Data
Week 1
Handling in Python
Handling various Data
Week 2
formats
Week 3 Data Wrangling
Week 4 Basic Statistics Refresher
Plotting and Visualization in
Week 5 Assignment 1
Python
Week 6 Sampling Strategies Assignment 1 Submission
Week 7 Hypothesis Testing
Week 8 Hypothesis Testing II
Week 9 Analysis of Variance
Week 10 Linear Regression Assignment 2
Week 11 Multiple Regression Assignment 2 Submission
Week 12 Logistic Regression
Week 13 Classification
Ensemble Techniques and
Week 14
Clustering
Week 15 Projects / Demo Assignment 3
Week 16 Revision Assignment 3 Submission