No SQLIntro
No SQLIntro
SYLLABUS
SYLLABUS
SYLLABUS
COURSE LEARNING OBJECTIVES:
Big data is a collection of data sets so large and complex that it becomes
difficult to process using on-hand database management tools
The challenges include capture, storage, search, sharing, analysis, and
visualization.
Big data is the realization of greater business intelligence by storing,
processing, and analyzing data that was previously ignored due to the
limitations of traditional data management technologies
BACKGROUND
Relational databases mainstay of business
Web-based applications caused spikes
explosion of social media sites (Facebook, Twitter) with large data needs
rise of cloud-based solutions such as Amazon S3 (simple storage solution)
Hooking RDBMS to web-based application becomes troublesome
WHAT IS NOSQL?
NoSQL (Not Only SQL): Databases that “move beyond” relational data models (i.e., no
tables, limited or no use of SQL)
– Focus on retrieval of data and appending new data (not necessarily tables)
– Focus on key-value data stores that can be used to locate data objects
– Focus on supporting storage of large quantities of unstructured data
– SQL is not used for storage or retrieval of data
– No ACID (atomicity, consistency, isolation, durability)
WHAT IS NOSQL?
The Name:
Stands for Not Only SQL
The term NOSQL was introduced by Carl Strozzi in 1998 to name his file-
based database
It was again re-introduced by Eric Evans when an event was organized to
discuss open source distributed databases
Eric states that “… but the whole point of seeking alternatives is that you
need to solve a problem that relational databases are a bad fit for. …”
12
WHAT IS NOSQL?
Key features (advantages):
non-relational
don’t require schema
data are replicated to multiple
nodes (so, identical & fault-tolerant)
and can be partitioned:
down nodes easily replaced
no single point of failure
horizontal scalable
cheap, easy to implement
(open-source)
massive write performance
fast key-value access
13
WHAT IS NOSQL?
Disadvantages:
Don’t fully support relational features
no join, group by, order by operations (except within partitions)
no referential integrity constraints across partitions
No declarative query language (e.g., SQL) more programming
Relaxed ACID (see CAP theorem) fewer guarantees
No easy integration with other applications that support SQL
14