0% found this document useful (0 votes)

16 views66 pages

07 BigData DataAnalysis

The document provides an overview of Big Data, including its definition, architecture components, storage methods, and processing techniques such as MapReduce and Spark. It discusses various data storage systems like distributed file systems and key-value stores, as well as the importance of data analytics for business decision-making. Additionally, it highlights the role of data warehousing and the use of machine learning for predictive modeling.

Uploaded by

hieutm0507

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views66 pages

07 BigData DataAnalysis

Uploaded by

hieutm0507

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

BIG DATA – DATA ANALYSIS

Lê Hồng Hải
UET-VNUH
Big Data Overview

1 Introduction

2 Big Data storages

3 Big Data processing

4 Streaming

2
Big Data

 The definition of big data is data that

contains greater variety, arriving in
increasing volumes and with more
velocity. This is also known as the 3 Vs

3
Big Data

4
Big data architecture components

• Data sources – relational databases, files (e.g., web

server log files) produced by applications, real-time
data produced by IoT devices.
• Big data storage –storing high data volumes of
different types before filtering, aggregating, and
preparing data for analysis.
• Real-time message ingestion store – to capture and
store real-time messages for stream processing.
• Analytical data store – relational databases for
preparing and structuring big data for further
analytical querying.
• Big data analytics and reporting, which may include
OLAP cubes, ML tools, BI tools, etc. – to provide big
data insights to end users.
5
Big data architecture

6
Big Data Storage

1. Distributed file systems

2. Sharding across multiple databases
3. Key-value storage systems
4. Parallel and distributed databases

7
Distributed File Systems

A distributed file system stores data across

a large collection of machines, but provides
a single file-system view
 Provides redundant storage of massive
amounts of data on cheap and unreliable
computers
◼ Google File System (GFS)
◼ Hadoop File System (HDFS)

8
Hadoop File System Architecture

▪ Single Namespace for entire

cluster
▪ Files are broken up into
blocks
• Typically 64 MB block size
• Each block replicated on
multiple DataNodes
▪ Client
• Finds the location of
blocks from NameNode
• Accesses data directly
from DataNode

9
Hadoop Distributed File System (HDFS)

 Data Coherency
◼ Write-once-read-many access model
◼ Client can only append to existing files
 Distributed file systems good for millions
of large files

10
Big Data Storage

1. Distributed file systems

2. Sharding across multiple databases
3. Key-value storage systems
4. Parallel and distributed databases

11
Sharding

 Sharding: partition data across multiple

databases
 Partitioning usually done on some
partitioning attributes (also known as
partitioning keys or shard keys e.g.
user ID
◼ E.g., records with key values from 1 to
100,000 on database 1,
records with key values from 100,001 to
200,000 on database 2, etc

12
Key Value Storage Systems

 Key-value storage systems store large

numbers (billions or even more) of small
(KB-MB) sized records
 Records are partitioned across multiple
machines and
 Queries are routed by the system to
appropriate machine
 Records are also replicated across
multiple machines, to ensure availability
even if a machine fails
◼ Key-value stores ensure that updates are
applied to all replicas, to ensure that their
values are consistent13
Key Value Storage Systems

 Key-value stores may store

◼ uninterpreted bytes, with an associated key
 E.g., Amazon S3, Amazon Dynamo
◼ Wide-table (can have arbitrarily many
attribute names) with associated key
▪ Google BigTable, Apache Cassandra, Apache Hbase,
Amazon DynamoDB
◼ JSON
 MongoDB, CouchDB (document model)
 Document stores store semi-structured
data, typically JSON
 Some key-value stores support multiple
versions of data, with timestamps/version
numbers 14
Data Representation

 An example of a JSON object is:

{
"ID": "22222",
"name": {
"firstname: "Albert",
"lastname: "Einstein"
},
"deptname": "Physics",
"children": [
{ "firstname": "Hans", "lastname":
"Einstein" },
{ "firstname": "Eduard", "lastname":
"Einstein" }
]
}
15
Key Value Storage Systems

 Key-value stores support

◼ put(key, value): used to store values with an
associated key,
◼ get(key): which retrieves the stored value
associated with the specified key
◼ delete(key) -- Remove the key and its
associated value
 Some systems also support range
queries on key values
 Document stores also support queries on
non-key attributes
◼ See book for MongoDB queries
◼ Also called NoSQL systems
16
Replication and Consistency

 Availability (system can run even if parts have

failed) is essential for parallel/distributed
databases
◼ Via replication, so even if a node has failed, another copy
is available
 Consistency is important for replicated data
◼ All live replicas have same value, and each read sees
latest version
 Network partitions (network can break into two
or more parts, each with active systems that can’t
talk to other parts)
 In presence of partitions, cannot guarantee both
availability and consistency
◼ Brewer’s CAP “Theorem”

17
Big data architecture

18
Big Data Processing

 Map-Reduce
 Spark
 Streaming

19
The MapReduce Paradigm

 Platform for reliable, scalable parallel computing

 Abstracts issues of distributed and parallel
environment from programmer
◼ Programmer provides core logic (via map() and
reduce() functions)
◼ System takes care of parallelization of
computation, coordination, etc

20
MapReduce - Dataflow

21
The MapReduce Paradigm

 Paradigm dates back many decades

◼ But very large scale implementations
running on clusters with 10^3 to 10^4
machines are more recent
◼ Google Map Reduce, Hadoop, ..
 Data storage/access typically done using
distributed file systems or key-value stores

22
MapReduce Programming Model

 Input: a set of key/value pairs

 User supplies two functions:
◼ map(k,v) → list(k1,v1)
◼ reduce(k1, list(v1)) → v2
 (k1,v1) is an intermediate key/value pair
 Output is the set of (k1,v2) pairs

23
Flow of Keys and Values

 Flow of keys and values in a map

reduce task
rk1 rv1 rk1 rv1,rv7,...
rk7 rv2 rk2 rv8,rvi,...
mk1 mv1
rk3 rv3 rk3 rv3,...
mk2 mv2
rk1 rv7
rk7 rv2,...
rk2 rv8

rki ... rvn,...

rk2 rvi
mkn mvn
rki rvn

map inputs map outputs reduce inputs

(key, value) (key, value)

https://www.geeksforgeeks.org/how-to-execute-wordcount-program-
in-mapreduce-using-cloudera-distribution-hadoop-cdh/
24
Example

I am a tiger, you are also a

tiger
I,1 a,2
map am,1 a, 1 also,1
a,1 a,1 reduce am,1
are,1 part0
also,1
tiger,1 am,1
map you,1 are,1
are,1 I,1
tiger,1 I, 1
tiger,1 tiger,2 part1
also,1 you,1 reduce you,1
map a, 1
tiger,1

JobTracker generates JobTracker generates

Hadoop sorts the
three TaskTrackers for two TaskTrackers
25 for
intermediate data
map tasks map tasks
25
Parallel Processing of MapReduce Job

User
Program
copy copy copy

Master
assign assign
map reduce
Part 1 Map 1 Reduce 1 File 1
Part 2
Part 3 Reduce 1 write File 2
Map 2
Part 4
local
write
Part n
read Map n Reduce m File m
Remote
Read, Sort
Input file Intermediate Output files
partitions files

26
Map Reduce vs. Databases

 Map Reduce widely used for parallel

processing
◼ Google, Yahoo, and 100’s of other companies
◼ Example uses: compute PageRank, build
keyword indices, do data analysis of web click
logs, ….
 Many real-world uses of MapReduce
cannot be expressed in SQL
 But many computations are much easier to
express in SQL

27
Map Reduce vs. Databases (Cont.)

 Relational operations (select, project, join,

aggregation, etc.) can be expressed using
Map Reduce
 SQL queries can be translated into Map
Reduce infrastructure for execution
◼ Apache Hive SQL, Apache Pig Latin, Microsoft
SCOPE

28
Where is MapReduce Inefficient?

 Long pipelines sharing data

 Interactive applications
 Streaming applications

(MapReduce would need to write and read

from disk a lot)

29
Spark

 The key idea of Spark is Resilient Distributed

Datasets (RDD)
 It supports in-memory processing computation

30
RDD Spark

 Resilient Distributed Dataset (RDD)

abstraction
◼ Collection of records that can be stored
across multiple machines
 Read-only partitioned collection of records
(like a DFS) but with a record of how the
dataset was created as a combination of
transformations from other dataset(s)

31
Word Count in Spark

32
Spark DataFramesand DataSet

 RDDs in Spark can be typed in programs,

but not dynamically
 The DataSet type allows types to be
specified dynamically
 Row is a row type, with attribute names
◼ In code below, attribute names/types of
instructor and department are inferred from
files read

33
Spark DataFramesand DataSet

 Operations filter, join, groupBy, agg, etc defined

on DataSet, and can execute in parallel
 Dataset<Row> instructor =
spark.read().parquet("...");
Dataset<Row> department =
spark.read().parquet("...");
instructor.filter(instructor.col("salary").gt(100000
))
.join(department, instructor.col("dept name")
.equalTo(department.col("dept name")))
.groupBy(department.col("building"))
.agg(count(instructor.col("ID")));

34
StreamingData

35
Streaming Data and Applications

 Streaming data refers to data that

arrives in a continuous fashion
 Applications include:
◼ Stock market: stream of trades
◼ Sensors: sensor readings
 Internet of things
◼ Network monitoring data
◼ Social media: tweets and posts can be viewed
as a stream
 Queries on streams can be very useful
◼ Monitoring, alerts, automated triggering of
actions

36
Publish Subscribe Systems

 Publish-subscribe (pub-sub) systems

provide a convenient abstraction for
processing streams
◼ Tuples in a stream are published to a topic
◼ Consumers subscribe to topic

37
Apache Kafka

 Apache Kafka is a popular parallel pub-sub

system widely used to manage streaming data
 Parallel pub-sub systems allow tuples in a
topic to be partitioned across multiple
machines

38
Big data architecture

39
Data Analytics

1. Overview
2. Data Warehousing (DW)
3. Online Analytical Processing (OLAP)
4. Data Mining

40
Overview

 Data analytics: the processing of data to

infer patterns, correlations, or models for
prediction
 Primarily used to make business decisions
◼ E.g., what product to suggest for purchase
◼ E.g., what products to manufacture/stock, in
what quantity
 Critical for businesses today

41
Common steps in data analytics

 Gather data from multiple sources into one

location
 Data warehouses also integrate data into a
common schema
 Data often needs to be extracted from
source formats, transformed into
common schema, and loaded into the
data warehouse (ETL)

42
Data Analytics

 Generate aggregates and reports

summarizing data
◼ Dashboards showing graphical charts/reports
◼ Online analytical processing (OLAP)
systems allow interactive querying
◼ Statistical analysis using tools such as
R/SAS/SPSS
 Build predictive models and use the
models for decision making

43
Overview (Cont.)

 Predictive models are widely used today

◼ E.g., use customer profile features and the
history of a customer to predict the likelihood
of default on a loan
◼ E.g., use history of sales to predict future sales
 Other examples of business decisions:
◼ What items to stock?
◼ What insurance premium to change?
◼ To whom to send advertisements?

44
Overview (Cont.)

 Machine learning techniques are key to

finding patterns in data and making
predictions
 Data mining extends techniques
developed by machine-learning
communities to run them on very large
datasets
 The term business intelligence (BI) is
synonym for data analytics

45
Data Warehousing

 A data warehouse is a repository (archive)

of information gathered from multiple
sources, stored under a unified schema, at a
single site

46
Warehouse Design issues

 Data transformation and data

cleansing
◼E.g., correct mistakes in addresses
(misspellings, zip code errors)
 How to propagate updates
 What data to summarize

47
Multidimensional Data

 Data in warehouses can usually be divided

into
◼ Fact tables, which are large
 E.g, sales(item_id, store_id,
customer_id, date, number, price)
◼ Dimension tables, which are relatively
small
 Store extra information about stores,
items, etc.

48
Fact Tables

 Attributes of fact tables can be usually

viewed as
◼ Measure attributes
 measure some value, and can be
aggregated upon
 e.g., the attributes number or price of
the sales relation
◼ Dimension attributes
 dimensions on which measure attributes
are viewed

49
Data Warehouse Star Schema

50
More on Data Warehouse Star Schema

51
Multidimensional Data and Warehouse Schemas

 More complicated schema structures

◼ Snowflake schema: multiple levels of
dimension tables

52
Data lakes

 Some applications do not find it worthwhile

to bring data to a common schema
◼ Data lakes are repositories which allow data to
be stored in multiple formats, without schema
integration
◼ Less upfront effort, but more effort during
querying

53
Database Support for Data Warehouses

 Data in warehouses usually append-only,

not updated. Can avoid concurrency
control overheads
 Data warehouses often use column-
oriented storage

54
Column-oriented storage

 Arrays are compressed, reducing storage,

IO and memory costs significantly
 Queries can fetch only attributes that they
care about, reducing IO and memory cost
 Data warehouses often use parallel storage
and query processing infrastructure

55
Data Analysis and OLAP

 Online Analytical Processing (OLAP)

 Interactive analysis of data, allowing data
to be summarized and viewed in different
ways in an online fashion (with negligible
delay)

56
Cross Tabulation

 The table below is an example of a cross-

tabulation (cross-tab), also referred to as a
pivot-table

57
Data Cube

 A data cube is a multidimensional

generalization of a cross-tab
 Can have n dimensions; we show 3 below
 Cross-tabs can be used as views on a data
cube

58
Online Analytical Processing Operations

 Pivoting: changing the dimensions used in a

cross-tab
 Slicing: creating a cross-tab for fixed values
only
 Rollup: moving from finer-granularity data to
a coarser granularity
 Drill down: The opposite operation - that of
moving from coarser-granularity data to finer-
granularity data

59
Hierarchies on Dimensions

 Hierarchy on dimension attributes: lets

dimensions be viewed at different levels of
detail

60
Cross Tabulation With Hierarchy

 Cross-tabs can be easily extended to deal

with hierarchies
 Can drill down or roll up on a hierarchy
 E.g. hierarchy: item_name → category

61
Reporting and Visualization

 Reporting tools help create formatted

reports with tabular/graphical
representation of data
 Data visualization tools help create
interactive visualization of data
◼ E.g., PowerBI, Tableau, FusionChart, plotly,
Datawrapper, Google Charts, etc.

62
Reporting and Visualization

63
Data Mining

 Data mining is the process of semi-

automatically analyzing large databases to
find useful patterns
 Some types of knowledge can be represented
as rules
 More generally, knowledge is discovered by
applying machine learning techniques to
past instances of data to form a model

64
Types of Data Mining Tasks

 Prediction based on past history

◼ Predict if a credit card applicant poses a good
credit risk, based on some attributes (income,
job type, age, ..) and past history
 Some examples of prediction mechanisms:
◼ Classification
 Items (with associated attributes) belong to one of
several classes
 Training instances have attribute values and classes
provided
◼ Regression formulae
 Given a set of mappings for an unknown function,
predict the function result for a new parameter value

65
THANKS YOU

Big Data Slides
No ratings yet
Big Data Slides
26 pages
Uc PDF
No ratings yet
Uc PDF
10 pages
Bda Unit 1
No ratings yet
Bda Unit 1
32 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Big Data Challenges and Hadoop Insights
No ratings yet
Big Data Challenges and Hadoop Insights
55 pages
Biggdata
No ratings yet
Biggdata
24 pages
Flume: Data Ingestion for Hadoop
No ratings yet
Flume: Data Ingestion for Hadoop
35 pages
Bda 123
No ratings yet
Bda 123
36 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
IET Udaipur BDA Unit-1
No ratings yet
IET Udaipur BDA Unit-1
10 pages
Big Data
No ratings yet
Big Data
29 pages
Big Data Analysis PDF 2
No ratings yet
Big Data Analysis PDF 2
18 pages
Bda-3 Unit
No ratings yet
Bda-3 Unit
23 pages
Big Data Complete Notes
100% (3)
Big Data Complete Notes
33 pages
Hadoop for Scalable Data Management
No ratings yet
Hadoop for Scalable Data Management
58 pages
9 Hadoop PDF
No ratings yet
9 Hadoop PDF
59 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
I Am Preparing For A Big Data Analytics University...
No ratings yet
I Am Preparing For A Big Data Analytics University...
15 pages
Big Data Imp-1
No ratings yet
Big Data Imp-1
16 pages
Big Data Analytics
No ratings yet
Big Data Analytics
61 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Understanding Big Data and Hadoop
No ratings yet
Understanding Big Data and Hadoop
25 pages
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
No ratings yet
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
30 pages
Top Big Data Platforms & Use Cases
No ratings yet
Top Big Data Platforms & Use Cases
9 pages
Analyzing Big Data in Hadoop Spark
No ratings yet
Analyzing Big Data in Hadoop Spark
30 pages
Overview of NoSQL Data Stores
No ratings yet
Overview of NoSQL Data Stores
6 pages
Data Science Essentials & Big Data Concepts
No ratings yet
Data Science Essentials & Big Data Concepts
20 pages
Module 5 BDA
No ratings yet
Module 5 BDA
25 pages
Big Data
No ratings yet
Big Data
25 pages
Unit 1 BDA
No ratings yet
Unit 1 BDA
43 pages
Big Data Overview and Frameworks Guide
No ratings yet
Big Data Overview and Frameworks Guide
14 pages
BDA Final
No ratings yet
BDA Final
23 pages
Course Code: CCS334 Course Name: Big Data Analytics Regulation: 2021 Year/Sem: Iii / Vi Faculty Incharge
No ratings yet
Course Code: CCS334 Course Name: Big Data Analytics Regulation: 2021 Year/Sem: Iii / Vi Faculty Incharge
12 pages
Hadoop & Big Data Overview
No ratings yet
Hadoop & Big Data Overview
23 pages
Big Data Challenges & Solutions
100% (1)
Big Data Challenges & Solutions
17 pages
Introduction To Big Data and NoSQL
No ratings yet
Introduction To Big Data and NoSQL
52 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
Bigdata Overview PDF
No ratings yet
Bigdata Overview PDF
98 pages
Big Data-2
No ratings yet
Big Data-2
40 pages
TIE - 21CS71 SIMP With Key Answers
No ratings yet
TIE - 21CS71 SIMP With Key Answers
19 pages
Big Data & Hadoop Overview
No ratings yet
Big Data & Hadoop Overview
44 pages
Unit - I Introduction To Big Data
No ratings yet
Unit - I Introduction To Big Data
38 pages
Understanding Data Science Concepts
No ratings yet
Understanding Data Science Concepts
29 pages
Cloud Computing Storage Technologies
No ratings yet
Cloud Computing Storage Technologies
45 pages
BDT Viva Questions
No ratings yet
BDT Viva Questions
2 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Chapter - 2 Hadoop
100% (1)
Chapter - 2 Hadoop
32 pages
Big Data Module 1,2,3
No ratings yet
Big Data Module 1,2,3
59 pages
BDS Session 1
100% (1)
BDS Session 1
70 pages
Ashish Presentation Stage1 Modify LR
No ratings yet
Ashish Presentation Stage1 Modify LR
24 pages
BD Imp Ques 1
100% (1)
BD Imp Ques 1
22 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
9 pages
Data Science
No ratings yet
Data Science
87 pages
Bdhs - Ebook
No ratings yet
Bdhs - Ebook
970 pages
Data Science
No ratings yet
Data Science
31 pages
Big Data
No ratings yet
Big Data
12 pages
Data Warehousing
No ratings yet
Data Warehousing
3 pages
Snowpro-Core 7
No ratings yet
Snowpro-Core 7
37 pages
Partitioning and Relative Page Numbering On DB2 12
No ratings yet
Partitioning and Relative Page Numbering On DB2 12
30 pages
DBMS Viva Questions
No ratings yet
DBMS Viva Questions
12 pages
Question 1670717
No ratings yet
Question 1670717
7 pages
Megha - Kumayu Profile
No ratings yet
Megha - Kumayu Profile
1 page
Data Sheet - EDB - Data - Sheet - Enterprise - Subscription - 20160126
No ratings yet
Data Sheet - EDB - Data - Sheet - Enterprise - Subscription - 20160126
2 pages
Set C Ans Key Oracle DB
No ratings yet
Set C Ans Key Oracle DB
6 pages
TT SQL Cheat Sheet
No ratings yet
TT SQL Cheat Sheet
7 pages
TD + Corrected - Database
No ratings yet
TD + Corrected - Database
42 pages
Richardson AIS3e CH04 Solutions
No ratings yet
Richardson AIS3e CH04 Solutions
35 pages
SQL Basics for Database Users
No ratings yet
SQL Basics for Database Users
52 pages
02 Database Lecture Databases
No ratings yet
02 Database Lecture Databases
53 pages
Relational Database Design Guide
No ratings yet
Relational Database Design Guide
14 pages
Ledger Database for Vehicle Registration
No ratings yet
Ledger Database for Vehicle Registration
9 pages
Assignment 3 Best Case
No ratings yet
Assignment 3 Best Case
18 pages
SQL Query Order of Execution
No ratings yet
SQL Query Order of Execution
14 pages
Skillshare Cookie Data Overview
No ratings yet
Skillshare Cookie Data Overview
11 pages
University Database Design Project
No ratings yet
University Database Design Project
9 pages
Advanced DBMS Course Overview
No ratings yet
Advanced DBMS Course Overview
1 page
Database Keys and SQL Queries Explained
No ratings yet
Database Keys and SQL Queries Explained
3 pages
CBSE Class 12 Informatic Practices Databases and SQL
No ratings yet
CBSE Class 12 Informatic Practices Databases and SQL
45 pages
Object-Oriented Design Principles and DBMS
No ratings yet
Object-Oriented Design Principles and DBMS
17 pages
Business Intelligence for Managers
No ratings yet
Business Intelligence for Managers
30 pages
DBMS Overview: Types and Benefits
No ratings yet
DBMS Overview: Types and Benefits
16 pages
Unit 4
No ratings yet
Unit 4
4 pages
B.Tech DBMS Course Overview
No ratings yet
B.Tech DBMS Course Overview
3 pages
Database Locking Types Explained
No ratings yet
Database Locking Types Explained
8 pages
Database Design for Professors
No ratings yet
Database Design for Professors
8 pages
Automate SQL Server with Agent Jobs
No ratings yet
Automate SQL Server with Agent Jobs
28 pages

07 BigData DataAnalysis

Uploaded by

07 BigData DataAnalysis

Uploaded by

BIG DATA – DATA ANALYSIS

2 Big Data storages

3 Big Data processing

 The definition of big data is data that

• Data sources – relational databases, files (e.g., web

1. Distributed file systems

A distributed file system stores data across

▪ Single Namespace for entire

1. Distributed file systems

 Sharding: partition data across multiple

 Key-value storage systems store large

 Key-value stores may store

 An example of a JSON object is:

 Key-value stores support

 Availability (system can run even if parts have

 Platform for reliable, scalable parallel computing

 Paradigm dates back many decades

 Input: a set of key/value pairs

 Flow of keys and values in a map

rki ... rvn,...

map inputs map outputs reduce inputs

I am a tiger, you are also a

JobTracker generates JobTracker generates

 Map Reduce widely used for parallel

 Relational operations (select, project, join,

 Long pipelines sharing data

(MapReduce would need to write and read

 The key idea of Spark is Resilient Distributed

 Resilient Distributed Dataset (RDD)

 RDDs in Spark can be typed in programs,

 Operations filter, join, groupBy, agg, etc defined

 Streaming data refers to data that

 Publish-subscribe (pub-sub) systems

 Apache Kafka is a popular parallel pub-sub

 Data analytics: the processing of data to

 Gather data from multiple sources into one

 Generate aggregates and reports

 Predictive models are widely used today

 Machine learning techniques are key to

 A data warehouse is a repository (archive)

 Data transformation and data

 Data in warehouses can usually be divided

 Attributes of fact tables can be usually

 More complicated schema structures

 Some applications do not find it worthwhile

 Data in warehouses usually append-only,

 Arrays are compressed, reducing storage,

 Online Analytical Processing (OLAP)

 The table below is an example of a cross-

 A data cube is a multidimensional

 Pivoting: changing the dimensions used in a

 Hierarchy on dimension attributes: lets

 Cross-tabs can be easily extended to deal

 Reporting tools help create formatted

 Data mining is the process of semi-

 Prediction based on past history

You might also like