SATHYABAMA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL OF SCIENCE AND HUMANITIES
L T P EL Credits Total Marks
SBSB5303 DATA SCIENCE
4 0 0 0 4 100
COURSE OBJECTIVES
➢ To study the basic concepts of Data Science and data life cycle
➢ To understand the theoretical and mathematical aspects of Data Science models
➢ To learn common random variables and their uses, and with the use of empirical
distributions
UNIT 1 INTRODUCTION TO DATA SCIENCE 12 Hrs.
What is Data Science? –The data life cycle: pre-processing, analysis, post-processing – Preprocessing: Data
gathering, cleansing, visualization, and understanding (Mean, Variance, Standard Deviation.
Percentiles.)–Data Storage (Relational databases, e.g. MySQL).
UNIT 2 APPROACHING ANALYTICS PROBLEMS 12 Hrs.
Key roles for successful Analytics project- Discovery- Business domain, Resources, Problem framing, Key
stakeholders, Analytics sponsors, Initial hypotheses, Data sources- Data Preparation- Learning about
the data, conditioning- Model Planning- Data exploration, Model selection- Model Building- Common
tools for model building- Communicate Results- Analysis over the different models- Operationalize-
Moving the model to deployment environment-Analytics Plan.
UNIT 3 INTRODUCTION TO R 12 Hrs.
Introduction to R- R Graphical user interfaces- Data Import and Export- Attributes and Data Types-
Vectors- Arrays and Matrices- Data Frames- Lists- Factors- Contingency Tables- Descriptive statistics- Model
building, Evaluation and Deployment- Hypotheses Testing- Null hypotheses and Alternative hypotheses-
Probability distributions- Statistical models in R- Data distribution.
UNIT 4 MODELING METHODS 12 Hrs.
Choosing and evaluating models – mapping problems to machine learning, evaluating clustering
models, validating models – cluster analysis – K-means algorithm, Naïve Bayes – Memorization
Methods – Linear and logistic regression – unsupervised methods.
UNIT 5 DELIVERING RESULTS 12 Hrs.
Documentation and deployment – producing effective presentations – Introduction to graphical analysis –
plot() function – displaying multivariate data – matrix plots – multiple plots in one window - exporting graph
- using graphics parameters. Case studies.
Max. 60 Hrs.
COURSE OUTCOMES
On completion of the course, student will be able to
CO1 - Understand the key concepts in data science, including tools and approaches. CO2
- Apply a suitable data science technique to solve an information analytics problem. CO3 - Able
to comprehend basic methods of processing data from real world problems CO4 - Analyse
and validate the models using appropriate performance metrics
CO5 - Understand the various techniques in data science
CO6 - Present the results using effective visualization techniques
M.SC COMPUTER SCIENCE 21 REGULATION 2024
SATHYABAMA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL OF SCIENCE AND HUMANITIES
TEXT / REFERENCE BOOKS
1. David Dietrich, Barry Heller, Beibei Yang, “Data Science and Big Data Analytics”, EMC Education
Services, 2015
2. Nina Zumel, JohnMount, “Practical Data Science with R”,Manning Publications,2014
3. Jure Leskovec, and Rajaraman, Jeffrey D. Ullman, “Mining of Massive Datasets”, Cambridge
University Press,2014
4. Glenn J. Myatt, Wayne P. Johnson, Making Sense of Data I: A Practical Guide to Exploratory
Data Analysis and Data
5. Mining, John Wiley & Son Publication, Second Edition, 2014.
6. Saltz Jeffrey S, An Introduction to Data Science, Sage Publications Inc, Second Edition, 20112
7. W.N.Venables,D.M.Smith and the R CoreTeam,“AnIntroductiontoR”,2013
END SEMESTER EXAMINATION QUESTION PAPER PATTERN
Max. Marks: 100 Exam Duration: 3 Hrs.
PART A: 6 Questions of 5 marks each – No choice 30 Marks
PART B: 2 Questions from each unit with internal choice, each carrying 14 marks 70 Marks
(Out of 100 marks, maximum of 10% problems may be asked)
M.SC COMPUTER SCIENCE 22 REGULATION 2024