0% found this document useful (0 votes)
26 views21 pages

BIG Data Analytics 21CSH-471: Computer Science & Engineering

The document outlines the curriculum for a Big Data Analytics course at Chandigarh University, covering key topics such as Big Data frameworks, SQL vs. NoSQL databases, and IBM Watson's role in Big Data. It details course outcomes, HBase data models, CRUD operations, and the characteristics of NoSQL graph databases like Neo4j. Additionally, it provides references for further reading and web resources for data analytics.

Uploaded by

Hrithik Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views21 pages

BIG Data Analytics 21CSH-471: Computer Science & Engineering

The document outlines the curriculum for a Big Data Analytics course at Chandigarh University, covering key topics such as Big Data frameworks, SQL vs. NoSQL databases, and IBM Watson's role in Big Data. It details course outcomes, HBase data models, CRUD operations, and the characteristics of NoSQL graph databases like Neo4j. Additionally, it provides references for further reading and web resources for data analytics.

Uploaded by

Hrithik Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Computer Science & Engineering


CHANDIGARH UNIVERSITY, MOHALI

BIG Data Analytics


21CSH-471

BY : Urvashi

Assistant Professor (Chandigarh


University)
Contents to be covered in UNIT
2
UNIT-2 Big Data Technologies Contact Hours:15

Chapter-1 Big Data Frameworks: Hadoop, Apache Spark, and their Comparison; NoSQL databases: MongoDB,
Big Data Cassandra, and HBase; Big Data Visualization Tools: Tableau, Power BI, and Zeppelin; Real-Time Big
Frameworks Data Processing: Apache Storm and Flink; Emerging trends in Big Data Technologies.

Overview of SQL vs. NoSQL: Differences and Use Cases; Introduction to Big SQL: Big SQL Features –
Chapter – 2 Scalability, support for structured and unstructured data, Query optimization Techniques in Big
Big SQL and SQL; NoSQL Database Types: Key-Value stores (Redis, DynamoDB), Document stores (MongoDB,
NO SQL CouchDB), Column-family stores (Cassandra, HBase), Graph Databases (Neo4j); Advantages and
Databases limitations of Big SQL and NoSQL.

Chapter – 3 Introduction to IBM Watson: Overview and capabilities of Watson AI, Watson’s role in Big data and
AI in Big Data decision-making; Key Watson Services: Watson Discovery, Watson Studio, and Watson Assistant,
Integration of Watson with Big Data tools; AI and Machine Learning Applications in Big Data:
Natural Language Processing (NLP), Sentiment Analysis and Predictive Analytics.
Course Outcomes

CO1 Understand the Fundamentals of Big Data.

CO2 Master Big Data Architecture and Tools

CO3 Explore the Hadoop Ecosystem and Data Processing Models

CO4 Develop Data Science Skills and Tools

CO5 Implement Real-Time Data Analytics and Visualization

3
HBASE Data Model and Versioning

• Data organization concepts


• Namespaces
• Tables
• Column families
• Column qualifiers
• Columns
• Rows
• Data cells
• Data is self-describing
Hbase Data Model and Versioning
(cont'd.)
• HBase stores multiple versions of data items
• Timestamp associated with each version
• Each row in a table has a unique row key
• Table associated with one or more
column families
• Column qualifiers can be dynamically specified
as new table rows are created and inserted
• Namespace is collection of tables
• Cell holds a basic data item
(a) creating a table:
create ‘EMPLOYEE”, 'Name', *Address', ‘Details’
(b) Inserting some row data In the EMPLOYEE table:
put ‘EMPLOYEE', ‘row1", ‘Name:Fname', ‘John'
put ‘EMPLOYEE', *row1”, *Name:Lname', ‘Smith'
put ’EMPLOYEE’, *row1', *Name:Nickname'. ‘Johnny’
put ‘EMPLOYEE’, ‘row1’, ‘Details:Job’,
‘Engineer’ put ‘EMPLOYEE’, ‘row1',
‘Details:Review’. ‘Good’ put ’EMPLOYEE',
*row2', ‘Name:Fname”, ‘Alicia” put
‘EMPLOYEE’, ‘row2', ‘Name:Lname', ‘Zelaya’
put ‘EMPLOYEE’, ‘row2’, ‘Name:MName“,
‘Jennifer' put ’EMPLOYEE’, ‘row2', ‘Details:Job’,
‘DBA’
put ‘EMPLOYEE’, *row2”, ‘Details:Supervisor’. ‘James
Borg’ put ‘EMPLOYEE’, ‘row3’. ‘Name:Fname', ’James’
put ‘EMPLOYEE', ‘row3”, ‘Name:Minit'. 'E’
put ‘EMPLOYEE', *row3”, ‘Name:Lname", ‘Borg'
put ’EMPLOYEE’, ‘row3’, *Name:Suffix‘. ‘Jr.'
put ‘EMPLOYEE', ‘row3', ‘Details:Job’. 'CEO'
put ‘EMPLOYEE', *row3’, *Details:Salary’, *1,000,000'

Ic) Some Hbase baslc CRUD operatlons:


Creating a table: create <tablename>, <coIumn family>,
<column family>,
Inserting Data: put <tabIename6, <rowid>, <column familys:<column qualifier>,
<vaIue6 Reading Data (all data in a table): scan <tablename>
Retrieve Data (one item): get <tabIename6.crowds

Figure 24.3 Examples in Hbase (a) Creating a table called EMPLOYEE with three
column families: Name, Address, and Details (b) Inserting some in the EMPLOYEE
table; diXerent rows can have different self-describing column qualifiers (Fname,
Lname, Nickname, Mname, Minit, Suffix, ... for column family Name; Job, Review,
Hbase Crud Operations
• Provides only low-level CRUD (create,read,
update, delete) operations
• Application programs implement more
complex operations
• Create
• Creates a new table and specifies one or more
column families associated with the table
• Put
• Inserts new data or new versions of existing data
items
• Get
• Retrieves data associated with a single row
• Scan
• Retrieves all the rows
Hbase Storage and Distributed System
Concepts
• Each Hbase table divided into several regions
• Each region holds a range of the row keys in the
table
• Row keys must be lexicographically ordered
• Each region has several stores
- Column families are assigned to stores
• Regions assigned to region servers for storage
• Master server responsible for
monitoring the region servers
• Hbase uses Apache Zookeeper and HDFS
NOSQL Graph Databases
Neo4j

Graph databases
• Data represented as a graph
• Collection of vertices (nodes) and edges
• Possible to store data associated with
both individual nodes and individual edges
Neo4j
• Open source system
• Uses concepts of nodes and relationships
Neo4j (cont'd.)
• Path
• Traversal of part of the graph
• Typically used as part of a query to specify
a
• pattern
• Schema optional in Neo4j
• Indexing and node identifiers
• Users can create for the collection of
nodes that have a particular label
• One or more properties can be indexed
Copyright6 2016 Ramez Elmasri and Shamkant B. Navathe
Slide 24-
Neo4j (cont'd.)
• Path
• Traversal of part of the graph
• Typically used as part of a query to specify
a
• pattern
• Schema optional in Neo4j
• Indexing and node identifiers
• Users can create for the collection of
nodes that have a particular label
• One or more properties can be indexed
Copyright6 2016 Ramez Elmasri and Shamkant B. Slide 24-
Neo4j (cont'd.)
• Path
• Traversal of part of the graph
• Typically used as part of a query to specify
a
• pattern
• Schema optional in Neo4j
• Indexing and node identifiers
• Users can create for the collection of
nodes that have a particular label
• One or more properties can be indexed
Copyright6 2016 Ramez Elmasri and Shamkant B. Slide 24-
Neo4j (cont'd.)
• Path
• Traversal of part of the graph
• Typically used as part of a query to specify a
• pattern
• Schema optional in Neo4j
• Indexing and node identifiers
• Users can create for the collection of
nodes that have a particular label
• One or more properties can be indexed
The Cypher Query Language of Neo4j:

• Cypher query made up of clauses


• Result from one clause can be the input to
the next clause in the query
The Cypher Query Language of
Neo4j(cont'd.) (d) Examples of simple Cypher
queries:
1. MATCH (d : DEPARTMENT
(Ono: ‘5’)) — I : Locatedln ] —+
(loc)
RETURN d.Dname , Ioc.Lname
2. MATCH (e: EMPLOYEE (Empid:
‘2’)) — ( w: WorksOn ] —+ (p)
RETURN e.Ename , w.Hours,
p.Pname
Figure (cont'd.) Examples in 3. MATCH (e ) - [ w: WorksOn ] —
Neo4j using the Cypher + (p: PROJECT (Pno: 2))
language RETURN p.Pname, e.Ename ,
w.Hours
(d) Examples of Cypher 4. MATCH (e) — [ w: WorksOn 1 —› (p)
queries RETURN e.Ename , w.Hours,
p.Pname ORDER BY e.Ename
WHERE numOfprojs
5. MATCH (e) — [ w: WorksOn2 1 —›
RETURN
(p) e.Ename ,
numOfprojs
RETURN ORDER, w.Hours,
e.Ename BY
numOfprojs
p.Pname
7. MATCH
ORDER(e) BY- [ w: WorksOn ]
—+ (p)
e.Ename UMIT 10
RETURN(e)
6. MATCH e -, w,
[ w:p
ORDER BY
WorksOn ] —+ (p)
e.Ename MMIT 10
WITH e,
8. MATCH (e:
COUNT(p) AS
EMPLOYEE
numOfprojs
Neo4j InteJaces and Distributed System
Characteristics
• Enterprise edition versus communityedition
• Enterprise editionsupports caching,
clustering of data, and locking
• Graph visualization interface
• Subset of nodes and edges in a database
graph can be displayed as a graph
• Used to visualize query results
• Master-slave replication
• Caching
• Logicallogs
Summary

• NOSQL systems focus on storage of “big data”


• General categories
• Document-based
• Key-value stores
• Column-based
• Graph-based
• Some systems use techniques spanning two
or more categories
• Consistency paradigms
Reference Books
TEXT BOOKS

1. Mohammed Guller, Big Data Analytics with Spark, Apress,2015


2. Tom Mitchell, “Machine Learning”, McGraw Hill, 3rdEdition,1997
3. Michael Minelli, Michehe Chambers, “Big Data, Big Analytics:
Emerging Business Intelligence and Analytic Trends for Today’s
Business”, 1stEdition, Ambiga Dhiraj, Wiely CIO Series, 2013.
4. Arvind Sathi, “Big Data Analytics: Disruptive Technologies for
Changing the Game”,1st Edition, IBM Corporation, 2012.

REFERENCE BOOKS

5. Chris Eaton, Dirk deroos et al., “Understanding Big data”, McGraw


Hill, 2012.
6. Vignesh Prajapati, “Big Data Analytics with R and Hadoop”, Packet
Publishing 2013.
7. JyLiebowitz, “Big Data and Business Analytics”, CRC press, 2013.
For more insight
Web sources 
1. https://www.alliant.edu/blog/4-top-
online-resources-data-analytics?
utm_source=chatgpt.com
2. https://www.alliant.edu/blog/4-top-
online-resources-data-analytics?
utm_source=chatgpt.com
3. https://www.coursera.org/articles/
big-data-technologies?
utm_source=chatgpt.com
4. https://careerfoundry.com/en/ Big Data Big Big Data and
Analytics Analytics
blog/data-analytics/where-to-find- Wiley
free-datasets/?
utm_source=chatgpt.com
THANK YOU

For queries
Email: [email protected]

You might also like