ICS 054 Data Analytics
Course Outcome ( CO) Bloom’s Knowledge Level (KL)
At the end of course , the student will be able to :
Describe the life cycle phases of Data Analytics through discovery, planning and K1,K2
CO 1
building.
CO 2 Understand and apply Data Analysis Techniques. K2, K3
CO 3 Implement various Data streams. K3
CO 4 Understand item sets, Clustering, frame works & Visualizations. K2
CO 5 Apply R tool for developing and evaluating real time applications. K3,K5,K6
DETAILED SYLLABUS 3-0-0
Unit Topic Proposed
Lecture
Introduction to Data Analytics: Sources and nature of data, classification of data
(structured, semi-structured, unstructured), characteristics of data, introduction to Big Data
platform, need of data analytics, evolution of analytic scalability, analytic process and
I 08
tools, analysis vs reporting, modern data analytic tools, applications of data analytics.
Data Analytics Lifecycle: Need, key roles for successful analytic projects, various phases
of data analytics lifecycle – discovery, data preparation, model planning, model building,
communicating results, operationalization.
Data Analysis: Regression modeling, multivariate analysis, Bayesian modeling, inference
and Bayesian networks, support vector and kernel methods, analysis of time series: linear
II systems analysis & nonlinear dynamics, rule induction, neural networks: learning and 08
generalisation, competitive learning, principal component analysis and neural networks,
fuzzy logic: extracting fuzzy models from data, fuzzy decision trees, stochastic search
methods.
Mining Data Streams: Introduction to streams concepts, stream data model and
architecture, stream computing, sampling data in a stream, filtering streams, counting
III 08
distinct elements in a stream, estimating moments, counting oneness in a window,
decaying window, Real-time Analytics Platform ( RTAP) applications, Case studies – real
time sentiment analysis, stock market predictions.
Frequent Itemsets and Clustering: Mining frequent itemsets, market based modelling,
Apriori algorithm, handling large data sets in main memory, limited pass algorithm,
IV 08
counting frequent itemsets in a stream, clustering techniques: hierarchical, K-means,
clustering high dimensional data, CLIQUE and ProCLUS, frequent pattern based clustering
methods, clustering in non-euclidean space, clustering for streams and parallelism.
Frame Works and Visualization: MapReduce, Hadoop, Pig, Hive, HBase, MapR,
Sharding, NoSQL Databases, S3, Hadoop Distributed File Systems, Visualization: visual
V data analysis techniques, interaction techniques, systems and applications. 08
Introduction to R - R graphical user interfaces, data import and export, attribute and data
types, descriptive statistics, exploratory data analysis, visualization before analysis,
analytics for unstructured data.
Text books:
1. Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer
2. Anand Rajaraman and Jeffrey David Ullman, Mining of Massive Datasets, Cambridge University Press.
3. Bill Franks, Taming the Big Data Tidal wave: Finding Opportunities in Huge Data Streams with Advanced
Analytics, John Wiley & Sons.
4. John Garrett, Data Analytics for IT Networks : Developing Innovative Use Cases, Pearson Education
5. Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics: Emerging Business