DATA SCIENCE & ANALYTICS
Course Code : CSE3105 Credits : 02
Credit Hours : 02/week Exam Hours : 03
Content of the Course:
Estimated
No. Topic Lesson Plan Sources of Content Question
Distribution
1 Introduction to [Rachel Schutt, Cathy O'Neil - 10%
data and Types of Data, Scales of measurement, Data Doing Data Science_ Straight Talk
science sets, Nature of Data Sets, Data Science process from the Frontline-O'Reilly Media
(2013).pdf]
2 Data Graphical methods: histograms, Line graph, [Section_2.1_2.2_data_types_and_e
Visualization & Bar chart, Scatter-plot, others rrors.pdf]
Representation Numerical methods: the average, the standard [Summarizing and Exploring
deviation, etc Data.pdf]
Tabular methods: contingency tables, others [06-VectorSpaceModel.pdf]
Data file format, Vector Space Model, Bag of
Words
3 Exploratory Detection of mistakes, Relationships among [Exploratory Data Analysis.pdf] 10%
Data Analysis the explanatory variables, Relationships
between explanatory and outcome variables.
Types of EDA are univariate non-graphical,
multivariate non-graphical, univariate
graphical, and multivariate graphical.
4 Data Data Quality, Data Cleaning: Missing Values, [illinois_Data_Preprocessing.pdf] 20%
Pre-processing- Noisy Data, Data Cleaning Process, Data
I Integration: The Entity Identification Problem,
Redundancy and Correlation Analysis
5 Data Data Reduction: Data Reduction Strategies, [illinois_Data_Preprocessing.pdf]
Pre-processing- Attribute Subset selection, Clustering
II Data Transformation and Data Discretization:
Data Transformation by Normalization,
Discretization by Binning
6 Knowledge KDD Process, Database Issues, Databases and [1992_Frawley_Knowledge 5%
Discovery in Knowledge, Discovered Knowledge, discovery in databases An
Databases Discovery Algorithms, Application Issues, overview.pdf]
Introduction to Data Mining [1996_Fayyad_From data mining to
knowledge discovery in
databases.pdf]
7 Statistical Why and how to estimate [Introduction to Statistical Learning
Learning predictive/descriptive models, The Trade-Off With Applications in R.pdf] 50%
Between Prediction Accuracy and Model
Interpretability, Supervised Versus
Unsupervised Learning, Regression Versus
Classification Problems, Assessing Model
Accuracy: Measuring the Quality of Fit, The
Bias-Variance Trade-Of
8 Predictive Linear Regression: Simple Linear Regression: [Introduction to Statistical Learning
Model: Estimating the Coefficients, Assessing the With Applications in R.pdf]
Regression Accuracy of the Coefficient Estimates,
Assessing the Accuracy of the Model, Multiple
Linear Regression: Estimating the Regression
Coefficients
9 Predictive Bayes theorem, Naive Bayes, Naive Bayes [naivebayes.pdf]
Model: Classifier, Text Classification, [knn-1-10.pdf]
Classification K-nearest-neighbor
10 Clustering Unsupervised learning, Types of clustering, [clustering_k-means.pdf]
K-Means
11 Metrics for Similarity and Dissimilarity measures, [http://jcsites.juniata.edu/faculty/rho
Machine evaluation metrics des/ml/simdissim.htm]
Learning [evaluation_metrics_fall2019-6-22.
pdf]
[https://drive.google.com/file/d/1Rh
5CnVtSrJkSiqhGsdTAffNV_ZMM
1bG8/view]
12 Others Inductive Software Engineering, Principles of [2016 Inductive Software 5%
Inductive Software Engineering, Data Engineering.pdf]
Journalism [Rachel Schutt, Cathy
O'Neil - Doing Data
Science_ Straight Talk from
the Frontline-O'Reilly
Media (2013).pdf]
Text Books:
(1) Cathy O'Neil, Rachel Schutt. Doing Data Science: Straight Talk from the Frontline.
O'Reilly.
(2) Trevor Hastie, Robert Tibshirani, Daniela Witten, Gareth James. “An Introduction to
Statistical Learning: With Applications in R”. Springer.
(3) Joseph Adler. “R in a Nutshell”. O’Reilly.
Reference Books:
(1) Salvador García, Julián Luengo, Francisco Herrera, “Data Preprocessing in Data
Mining”, Springer
(2) Russell A. Poldrack, “Statistical Thinking for the 21st Century”.