0% found this document useful (0 votes)

26 views21 pages

BIG Data Analytics 21CSH-471: Computer Science & Engineering

The document outlines the curriculum for a Big Data Analytics course at Chandigarh University, covering key topics such as Big Data frameworks, SQL vs. NoSQL databases, and IBM Watson's role in Big Data. It details course outcomes, HBase data models, CRUD operations, and the characteristics of NoSQL graph databases like Neo4j. Additionally, it provides references for further reading and web resources for data analytics.

Uploaded by

Hrithik Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views21 pages

BIG Data Analytics 21CSH-471: Computer Science & Engineering

Uploaded by

Hrithik Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

•

Computer Science & Engineering

CHANDIGARH UNIVERSITY, MOHALI

BIG Data Analytics

21CSH-471

BY : Urvashi

Assistant Professor (Chandigarh

University)
Contents to be covered in UNIT
2
UNIT-2 Big Data Technologies Contact Hours:15

Chapter-1 Big Data Frameworks: Hadoop, Apache Spark, and their Comparison; NoSQL databases: MongoDB,
Big Data Cassandra, and HBase; Big Data Visualization Tools: Tableau, Power BI, and Zeppelin; Real-Time Big
Frameworks Data Processing: Apache Storm and Flink; Emerging trends in Big Data Technologies.

Overview of SQL vs. NoSQL: Differences and Use Cases; Introduction to Big SQL: Big SQL Features –
Chapter – 2 Scalability, support for structured and unstructured data, Query optimization Techniques in Big
Big SQL and SQL; NoSQL Database Types: Key-Value stores (Redis, DynamoDB), Document stores (MongoDB,
NO SQL CouchDB), Column-family stores (Cassandra, HBase), Graph Databases (Neo4j); Advantages and
Databases limitations of Big SQL and NoSQL.

Chapter – 3 Introduction to IBM Watson: Overview and capabilities of Watson AI, Watson’s role in Big data and
AI in Big Data decision-making; Key Watson Services: Watson Discovery, Watson Studio, and Watson Assistant,
Integration of Watson with Big Data tools; AI and Machine Learning Applications in Big Data:
Natural Language Processing (NLP), Sentiment Analysis and Predictive Analytics.
Course Outcomes

CO1 Understand the Fundamentals of Big Data.

CO2 Master Big Data Architecture and Tools

CO3 Explore the Hadoop Ecosystem and Data Processing Models

CO4 Develop Data Science Skills and Tools

CO5 Implement Real-Time Data Analytics and Visualization

3
HBASE Data Model and Versioning

• Data organization concepts

• Namespaces
• Tables
• Column families
• Column qualifiers
• Columns
• Rows
• Data cells
• Data is self-describing
Hbase Data Model and Versioning
(cont'd.)
• HBase stores multiple versions of data items
• Timestamp associated with each version
• Each row in a table has a unique row key
• Table associated with one or more
column families
• Column qualifiers can be dynamically specified
as new table rows are created and inserted
• Namespace is collection of tables
• Cell holds a basic data item
(a) creating a table:
create ‘EMPLOYEE”, 'Name', *Address', ‘Details’
(b) Inserting some row data In the EMPLOYEE table:
put ‘EMPLOYEE', ‘row1", ‘Name:Fname', ‘John'
put ‘EMPLOYEE', *row1”, *Name:Lname', ‘Smith'
put ’EMPLOYEE’, *row1', *Name:Nickname'. ‘Johnny’
put ‘EMPLOYEE’, ‘row1’, ‘Details:Job’,
‘Engineer’ put ‘EMPLOYEE’, ‘row1',
‘Details:Review’. ‘Good’ put ’EMPLOYEE',
*row2', ‘Name:Fname”, ‘Alicia” put
‘EMPLOYEE’, ‘row2', ‘Name:Lname', ‘Zelaya’
put ‘EMPLOYEE’, ‘row2’, ‘Name:MName“,
‘Jennifer' put ’EMPLOYEE’, ‘row2', ‘Details:Job’,
‘DBA’
put ‘EMPLOYEE’, *row2”, ‘Details:Supervisor’. ‘James
Borg’ put ‘EMPLOYEE’, ‘row3’. ‘Name:Fname', ’James’
put ‘EMPLOYEE', ‘row3”, ‘Name:Minit'. 'E’
put ‘EMPLOYEE', *row3”, ‘Name:Lname", ‘Borg'
put ’EMPLOYEE’, ‘row3’, *Name:Suffix‘. ‘Jr.'
put ‘EMPLOYEE', ‘row3', ‘Details:Job’. 'CEO'
put ‘EMPLOYEE', *row3’, *Details:Salary’, *1,000,000'

Ic) Some Hbase baslc CRUD operatlons:

Creating a table: create <tablename>, <coIumn family>,
<column family>,
Inserting Data: put <tabIename6, <rowid>, <column familys:<column qualifier>,
<vaIue6 Reading Data (all data in a table): scan <tablename>
Retrieve Data (one item): get <tabIename6.crowds

Figure 24.3 Examples in Hbase (a) Creating a table called EMPLOYEE with three
column families: Name, Address, and Details (b) Inserting some in the EMPLOYEE
table; diXerent rows can have different self-describing column qualifiers (Fname,
Lname, Nickname, Mname, Minit, Suffix, ... for column family Name; Job, Review,
Hbase Crud Operations
• Provides only low-level CRUD (create,read,
update, delete) operations
• Application programs implement more
complex operations
• Create
• Creates a new table and specifies one or more
column families associated with the table
• Put
• Inserts new data or new versions of existing data
items
• Get
• Retrieves data associated with a single row
• Scan
• Retrieves all the rows
Hbase Storage and Distributed System
Concepts
• Each Hbase table divided into several regions
• Each region holds a range of the row keys in the
table
• Row keys must be lexicographically ordered
• Each region has several stores
- Column families are assigned to stores
• Regions assigned to region servers for storage
• Master server responsible for
monitoring the region servers
• Hbase uses Apache Zookeeper and HDFS
NOSQL Graph Databases
Neo4j

Graph databases
• Data represented as a graph
• Collection of vertices (nodes) and edges
• Possible to store data associated with
both individual nodes and individual edges
Neo4j
• Open source system
• Uses concepts of nodes and relationships
Neo4j (cont'd.)
• Path
• Traversal of part of the graph
• Typically used as part of a query to specify
a
• pattern
• Schema optional in Neo4j
• Indexing and node identifiers
• Users can create for the collection of
nodes that have a particular label
• One or more properties can be indexed
Copyright6 2016 Ramez Elmasri and Shamkant B. Navathe
Slide 24-
Neo4j (cont'd.)
• Path
• Traversal of part of the graph
• Typically used as part of a query to specify
a
• pattern
• Schema optional in Neo4j
• Indexing and node identifiers
• Users can create for the collection of
nodes that have a particular label
• One or more properties can be indexed
Copyright6 2016 Ramez Elmasri and Shamkant B. Slide 24-
Neo4j (cont'd.)
• Path
• Traversal of part of the graph
• Typically used as part of a query to specify
a
• pattern
• Schema optional in Neo4j
• Indexing and node identifiers
• Users can create for the collection of
nodes that have a particular label
• One or more properties can be indexed
Copyright6 2016 Ramez Elmasri and Shamkant B. Slide 24-
Neo4j (cont'd.)
• Path
• Traversal of part of the graph
• Typically used as part of a query to specify a
• pattern
• Schema optional in Neo4j
• Indexing and node identifiers
• Users can create for the collection of
nodes that have a particular label
• One or more properties can be indexed
The Cypher Query Language of Neo4j:

• Cypher query made up of clauses

• Result from one clause can be the input to
the next clause in the query
The Cypher Query Language of
Neo4j(cont'd.) (d) Examples of simple Cypher
queries:
1. MATCH (d : DEPARTMENT
(Ono: ‘5’)) — I : Locatedln ] —+
(loc)
RETURN d.Dname , Ioc.Lname
2. MATCH (e: EMPLOYEE (Empid:
‘2’)) — ( w: WorksOn ] —+ (p)
RETURN e.Ename , w.Hours,
p.Pname
Figure (cont'd.) Examples in 3. MATCH (e ) - [ w: WorksOn ] —
Neo4j using the Cypher + (p: PROJECT (Pno: 2))
language RETURN p.Pname, e.Ename ,
w.Hours
(d) Examples of Cypher 4. MATCH (e) — [ w: WorksOn 1 —› (p)
queries RETURN e.Ename , w.Hours,
p.Pname ORDER BY e.Ename
WHERE numOfprojs
5. MATCH (e) — [ w: WorksOn2 1 —›
RETURN
(p) e.Ename ,
numOfprojs
RETURN ORDER, w.Hours,
e.Ename BY
numOfprojs
p.Pname
7. MATCH
ORDER(e) BY- [ w: WorksOn ]
—+ (p)
e.Ename UMIT 10
RETURN(e)
6. MATCH e -, w,
[ w:p
ORDER BY
WorksOn ] —+ (p)
e.Ename MMIT 10
WITH e,
8. MATCH (e:
COUNT(p) AS
EMPLOYEE
numOfprojs
Neo4j InteJaces and Distributed System
Characteristics
• Enterprise edition versus communityedition
• Enterprise editionsupports caching,
clustering of data, and locking
• Graph visualization interface
• Subset of nodes and edges in a database
graph can be displayed as a graph
• Used to visualize query results
• Master-slave replication
• Caching
• Logicallogs
Summary

• NOSQL systems focus on storage of “big data”

• General categories
• Document-based
• Key-value stores
• Column-based
• Graph-based
• Some systems use techniques spanning two
or more categories
• Consistency paradigms
Reference Books
TEXT BOOKS

1. Mohammed Guller, Big Data Analytics with Spark, Apress,2015

2. Tom Mitchell, “Machine Learning”, McGraw Hill, 3rdEdition,1997
3. Michael Minelli, Michehe Chambers, “Big Data, Big Analytics:
Emerging Business Intelligence and Analytic Trends for Today’s
Business”, 1stEdition, Ambiga Dhiraj, Wiely CIO Series, 2013.
4. Arvind Sathi, “Big Data Analytics: Disruptive Technologies for
Changing the Game”,1st Edition, IBM Corporation, 2012.

REFERENCE BOOKS

5. Chris Eaton, Dirk deroos et al., “Understanding Big data”, McGraw

Hill, 2012.
6. Vignesh Prajapati, “Big Data Analytics with R and Hadoop”, Packet
Publishing 2013.
7. JyLiebowitz, “Big Data and Business Analytics”, CRC press, 2013.
For more insight
Web sources 
1. https://www.alliant.edu/blog/4-top-
online-resources-data-analytics?
utm_source=chatgpt.com
2. https://www.alliant.edu/blog/4-top-
online-resources-data-analytics?
utm_source=chatgpt.com
3. https://www.coursera.org/articles/
big-data-technologies?
utm_source=chatgpt.com
4. https://careerfoundry.com/en/ Big Data Big Big Data and
Analytics Analytics
blog/data-analytics/where-to-find- Wiley
free-datasets/?
utm_source=chatgpt.com
THANK YOU

For queries
Email: [email protected]

Neo4j Database Practical Guide
No ratings yet
Neo4j Database Practical Guide
12 pages
DBMS Unit4
No ratings yet
DBMS Unit4
28 pages
Chaima Gherib Report1
No ratings yet
Chaima Gherib Report1
6 pages
Neo4j Graph Database Guide
No ratings yet
Neo4j Graph Database Guide
29 pages
Neo4j Notes
No ratings yet
Neo4j Notes
10 pages
SQL 7
No ratings yet
SQL 7
18 pages
Neo4j and Cypher
No ratings yet
Neo4j and Cypher
15 pages
Neo4j - Graph Database PDF
0% (1)
Neo4j - Graph Database PDF
19 pages
Graph Databases and newSQL Overview
No ratings yet
Graph Databases and newSQL Overview
23 pages
Neo4j Graph Database Guide
No ratings yet
Neo4j Graph Database Guide
8 pages
Building Web Applications With Python and Neo4j - Sample Chapter
No ratings yet
Building Web Applications With Python and Neo4j - Sample Chapter
29 pages
Introtoneo4jwebinar331 160331235041
No ratings yet
Introtoneo4jwebinar331 160331235041
117 pages
No SQL
No ratings yet
No SQL
13 pages
Introduction To Data Science UNIT - IV
No ratings yet
Introduction To Data Science UNIT - IV
45 pages
9 HBase
No ratings yet
9 HBase
77 pages
Neo4j and Cypher Language Overview
No ratings yet
Neo4j and Cypher Language Overview
44 pages
Neo 4 J
100% (1)
Neo 4 J
4 pages
Neo4j and Cypher
No ratings yet
Neo4j and Cypher
11 pages
EUC1502 Module5 Big-Data
No ratings yet
EUC1502 Module5 Big-Data
46 pages
Presentation ON Neo4J
No ratings yet
Presentation ON Neo4J
5 pages
Unit 5 Nosql
No ratings yet
Unit 5 Nosql
72 pages
Unit 4
No ratings yet
Unit 4
4 pages
R23 IDS Unit4 PPT - 2.0
No ratings yet
R23 IDS Unit4 PPT - 2.0
38 pages
Lecture10 HBase
No ratings yet
Lecture10 HBase
70 pages
Online AppQ HR Q1-Q30
No ratings yet
Online AppQ HR Q1-Q30
30 pages
PR 6 No SQL
No ratings yet
PR 6 No SQL
10 pages
NOSQL Micro Project
No ratings yet
NOSQL Micro Project
42 pages
Beginnerpresentation 120429104540 Phpapp01
No ratings yet
Beginnerpresentation 120429104540 Phpapp01
30 pages
BAIT 580A Class Notes
No ratings yet
BAIT 580A Class Notes
8 pages
Graph Database
No ratings yet
Graph Database
92 pages
Noslu 5 Edit
No ratings yet
Noslu 5 Edit
35 pages
Unit 1 P2 HBase
No ratings yet
Unit 1 P2 HBase
22 pages
DDMUNIT5
No ratings yet
DDMUNIT5
11 pages
NoSQL Module - 5
No ratings yet
NoSQL Module - 5
28 pages
Assign 3
No ratings yet
Assign 3
36 pages
216-219, Tesma0802, IJEAST
No ratings yet
216-219, Tesma0802, IJEAST
4 pages
Cypher - 2
No ratings yet
Cypher - 2
34 pages
Learning Graph DB in One Night - Neo4j - by Prashant Mudgal - Towards Data Science
No ratings yet
Learning Graph DB in One Night - Neo4j - by Prashant Mudgal - Towards Data Science
20 pages
Neo4j Cypher Query Syntax Guide
No ratings yet
Neo4j Cypher Query Syntax Guide
3 pages
Neo4j Graph Database Lecture
No ratings yet
Neo4j Graph Database Lecture
46 pages
Bda Experiment 3: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
No ratings yet
Bda Experiment 3: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
5 pages
Neo4j Cookbook: Free Sample Preview
No ratings yet
Neo4j Cookbook: Free Sample Preview
31 pages
Unit 5 Notes
100% (3)
Unit 5 Notes
66 pages
Neo 4 J
No ratings yet
Neo 4 J
62 pages
5neo4jproductvisionandroadmapgraphsummitmilan Withdemo 4a64b1341 250425095401 641ea3cc
No ratings yet
5neo4jproductvisionandroadmapgraphsummitmilan Withdemo 4a64b1341 250425095401 641ea3cc
41 pages
Dbms External Exam Notes
No ratings yet
Dbms External Exam Notes
9 pages
Neo 4 J
No ratings yet
Neo 4 J
10 pages
W Java135
No ratings yet
W Java135
10 pages
Neo4j Manual
50% (2)
Neo4j Manual
529 pages
Lecture02 GraphDatabases Neo4J PDF
No ratings yet
Lecture02 GraphDatabases Neo4J PDF
95 pages
Neo4j Person Management Guide
No ratings yet
Neo4j Person Management Guide
16 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
16 pages
6 Graph Databases Neo4j
No ratings yet
6 Graph Databases Neo4j
46 pages
Neo4j: Graph Database Essentials
No ratings yet
Neo4j: Graph Database Essentials
14 pages
NOSQL Practical - 6 - To - 8
No ratings yet
NOSQL Practical - 6 - To - 8
61 pages
Neo4j PDF
No ratings yet
Neo4j PDF
30 pages
Graph Databases For SQL Server Professionals
No ratings yet
Graph Databases For SQL Server Professionals
34 pages
Laporan Praktikum Algoritma Dan Pemrograman: Pertemuan Ke - 5
No ratings yet
Laporan Praktikum Algoritma Dan Pemrograman: Pertemuan Ke - 5
13 pages
Change Request Template v1.1
No ratings yet
Change Request Template v1.1
6 pages
Web App Development Guide
100% (62)
Web App Development Guide
11 pages
01 Introduction To Nokia Netact
100% (1)
01 Introduction To Nokia Netact
82 pages
Primary, Secondary & Database Roles - by Somen Swain - Snowflake - Jan, 2023 - Medium
No ratings yet
Primary, Secondary & Database Roles - by Somen Swain - Snowflake - Jan, 2023 - Medium
7 pages
Complete JDBC Interview Questions Answers
No ratings yet
Complete JDBC Interview Questions Answers
6 pages
INFRASTRUCTURE AS A SERVICE-IaaS - BG0
No ratings yet
INFRASTRUCTURE AS A SERVICE-IaaS - BG0
15 pages
How To Do Sex
No ratings yet
How To Do Sex
14 pages
Cart To Quote Solution Kit
No ratings yet
Cart To Quote Solution Kit
13 pages
Software Checklist - RequirementsWorksheets
No ratings yet
Software Checklist - RequirementsWorksheets
96 pages
PDF Employee Leave Management System DD - PDF
No ratings yet
PDF Employee Leave Management System DD - PDF
3 pages
Data Filtering and Transformation Techniques
No ratings yet
Data Filtering and Transformation Techniques
7 pages
HyperPKI Epass2003 User Guide
No ratings yet
HyperPKI Epass2003 User Guide
29 pages
Building Web Applications With: Course 977
No ratings yet
Building Web Applications With: Course 977
463 pages
SAP Financial Accounting Courses
No ratings yet
SAP Financial Accounting Courses
4 pages
Security Baseline Template V2.4 Change Marker
No ratings yet
Security Baseline Template V2.4 Change Marker
88 pages
Location - Supply Chain Management (SCM) - SCN Wiki
No ratings yet
Location - Supply Chain Management (SCM) - SCN Wiki
3 pages
Mani P Sap Bw-Hana Consultant
No ratings yet
Mani P Sap Bw-Hana Consultant
3 pages
Deploying HTTP Proxy in 5G Network
No ratings yet
Deploying HTTP Proxy in 5G Network
12 pages
Cisco IronPort Management SOP KAUST
No ratings yet
Cisco IronPort Management SOP KAUST
8 pages
SQL Table
0% (1)
SQL Table
5 pages
Install Robot Framework on Windows 7
No ratings yet
Install Robot Framework on Windows 7
5 pages
AWS Certified Developer Exam Guide
No ratings yet
AWS Certified Developer Exam Guide
3 pages
Search Head Clustering Guide
No ratings yet
Search Head Clustering Guide
37 pages
Sdn/Nfv - Trends And Standards: 신명기, Etri 2014.6.24 Krnet2014
No ratings yet
Sdn/Nfv - Trends And Standards: 신명기, Etri 2014.6.24 Krnet2014
30 pages
Full Stack Developer Resume Summary
No ratings yet
Full Stack Developer Resume Summary
4 pages
DBMS 1 1
No ratings yet
DBMS 1 1
19 pages
Mobile Banking App Audit Checklist
No ratings yet
Mobile Banking App Audit Checklist
5 pages
Rename SAP Landscape Hosts Names (CAL and Replication) - SAP Q&A
No ratings yet
Rename SAP Landscape Hosts Names (CAL and Replication) - SAP Q&A
2 pages
QAD .NetUI 2009 - Operational Metrics
No ratings yet
QAD .NetUI 2009 - Operational Metrics
37 pages

BIG Data Analytics 21CSH-471: Computer Science & Engineering

Uploaded by

BIG Data Analytics 21CSH-471: Computer Science & Engineering

Uploaded by

•

Computer Science & Engineering

BIG Data Analytics

Assistant Professor (Chandigarh

CO1 Understand the Fundamentals of Big Data.

CO2 Master Big Data Architecture and Tools

CO3 Explore the Hadoop Ecosystem and Data Processing Models

CO4 Develop Data Science Skills and Tools

CO5 Implement Real-Time Data Analytics and Visualization

• Data organization concepts

Ic) Some Hbase baslc CRUD operatlons:

• Cypher query made up of clauses

• NOSQL systems focus on storage of “big data”

1. Mohammed Guller, Big Data Analytics with Spark, Apress,2015

5. Chris Eaton, Dirk deroos et al., “Understanding Big data”, McGraw

You might also like