SEMESTER - I
24PBDPC1 BIG DATA MINING AND L T P C
02 3 0 0 3
ANALYTICS
SDG NO. 4
OBJECTIVES:
To understand the computational approaches to Modelling, Feature
Extraction
To understand the need and application of Map Reduce
To understand the various search algorithms applicable to Big Data
To analyse and interpret streaming data
To learn how to handle large data sets in main memory and learn the
various clustering techniques applicable to Big Data.
UNIT I DATA MINING AND LARGE SCALE FILES 9
Introduction to Statistical modeling – Machine Learning – Computational
approaches to modeling – Summarization – Feature Extraction – Statistical
Limits on Data Mining - Distributed File Systems – Map-reduce – Algorithms
using Map Reduce– Efficiency of Cluster Computing Techniques.
UNIT II SIMILAR ITEMS 9
Nearest Neighbor Search – Shingling of Documents – Similarity preserving
summaries – Locality sensitive hashing for documents – Distance Measures
– Theory of Locality Sensitive Functions – LSH Families – Methods for High
Degree of Similarities.
UNIT III MINING DATA STREAMS 9
Stream Data Model – Sampling Data in the Stream – Filtering Streams –
Counting Distance Elements in a Stream – Estimating Moments – Counting
Onesin Window – Decaying Windows
UNIT IV LINK ANALYSIS AND FREQUENT ITEMSETS 9
Page Rank –Efficient Computation - Topic Sensitive Page Rank – Link Spam
– Market Basket Model – A-priori algorithm – Handling Larger Datasets in
Main Memory– Limited Pass Algorithm – Counting Frequent Item sets.
UNIT V CLUSTERING 9
Introduction to Clustering Techniques – Hierarchical Clustering –
Algorithms – K-Means – CURE – Clustering in Non – Euclidean Spaces –
Streams and Parallelism – Case Study: Advertising on the Web –
Recommendation Systems
TOTAL: 45 PERIODS
TEXT BOOKS:
1. Jure Leskovec, AnandRajaraman, Jeffrey David Ullman, “Mining of
Massive Datasets”, Cambridge University Press, Second Edition, 2014.
2. Jiawei Han, MichelineKamber, Jian Pei, “Data Mining Concepts
and Techniques”, Morgan Kaufman Publications, Third Edition,
2011.
REFERENCES:
1. Ian H.Witten, Eibe Frank “Data Mining – Practical Machine Learning
Tools and Techniques”, Morgan Kaufman Publications, Third Edition,
2011.
2. David Hand, HeikkiMannila and Padhraic Smyth, “Principles of Data
Mining”, MIT PRESS, 2001
WEB REFERENCES:
1. https://swayam.gov.in/nd2_arp19_ap60/preview
2. https://nptel.ac.in/content/storage2/nptel_data3/html/mhrd/ict/
text/106104189/lec1.pdf
ONLINERESOURCES:
1. https://examupdates.in/big-data-analytics/
2. https://www.tutorialspoint.com/big_data_analytics/index.htm
3. https://www.tutorialspoint.com/data_mining/index.htm
OUTCOMES :
Upon completion of the course, the student should be able to
1. Design algorithms by employing Map Reduce technique for solving Big
Data problems.
2. Design algorithms for Big Data by deciding on the apt Features set .
3. Design algorithms for handling petabytes of datasets
4. Design algorithms and propose solutions for Big Data by optimizing
main memory consumption
5. Design solutions for problems in Big Data by suggesting
appropriate clustering techniques.