0% found this document useful (0 votes)

36 views57 pages

DW Unit II Notes

The document discusses ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes in data warehousing, detailing their definitions, stages, advantages, and disadvantages. It also compares ETL and ELT, highlighting their differences in handling data, and outlines the types of data warehouses, including Enterprise Data Warehouse (EDW) and Operational Data Store (ODS). Overall, it emphasizes the importance of these processes in ensuring data accuracy, integration, and accessibility for analysis.

Uploaded by

Preethika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views57 pages

DW Unit II Notes

Uploaded by

Preethika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

UNIT II

ETL AND OLAP TECHNOLOGY

What is ETL – ETL Vs ELT – Types of Data warehouses - Data warehouse Design
and Modeling - Delivery Process - Online Analytical Processing (OLAP) -
Characteristics of OLAP - Online Transaction Processing (OLTP) Vs OLAP -
OLAP operations- Types of OLAP- ROLAP Vs MOLAP Vs HOLAP.
What is ETL
ETL stands for Extract, Transform, Load and it is a process used in data
warehousing to extract data from various sources, transform it into a format
suitable for loading into a data warehouse, and then load it into the warehouse.
The process of ETL can be broken down into the following three stages:
1. Extract: The first stage in the ETL process is to extract data from various
sources such as transactional systems, spreadsheets, and flat files. This step
involves reading data from the source systems and storing it in a staging
area.
2. Transform: In this stage, the extracted data is transformed into a format
that is suitable for loading into the data warehouse. This may involve
cleaning and validating the data, converting data types, combining data
from multiple sources, and creating new data fields.
3. Load: After the data is transformed, it is loaded into the data warehouse.
This step involves creating the physical data structures and loading the data
into the warehouse.
The ETL process is an iterative process that is repeated as new data is added to
the warehouse. The process is important because it ensures that the data in the
data warehouse is accurate, complete, and up-to-date. It also helps to ensure that
the data is in the format required for data mining and reporting.
Additionally, there are many different ETL tools and technologies available, such
as Informatica, Talend, DataStage, and others, that can automate and simplify the
ETL process.
ETL is a process in Data Warehousing and it stands
for Extract, Transform and Load. It is a process in which an ETL tool extracts the
data from various data source systems, transforms it in the staging area, and then
finally, loads it into the Data Warehouse system.
Let us understand each step of the ETL process in-depth:
1. Extraction:
The first step of the ETL process is extraction. In this step, data from
various source systems is extracted which can be in various formats like
relational databases, No SQL, XML, and flat files into the staging area. It
is important to extract the data from various source systems and store it
into the staging area first and not directly into the data warehouse because
the extracted data is in various formats and can be corrupted also. Hence
loading it directly into the data warehouse may damage it and rollback will
be much more difficult. Therefore, this is one of the most important steps
of ETL process.
2. Transformation:
The second step of the ETL process is transformation. In this step, a set of
rules or functions are applied on the extracted data to convert it into a single
standard format. It may involve following processes/tasks:
• Filtering – loading only certain attributes into the data warehouse.
• Cleaning – filling up the NULL values with some default values,
mapping U.S.A, United States, and America into USA, etc.
• Joining – joining multiple attributes into one.
• Splitting – splitting a single attribute into multiple attributes.
• Sorting – sorting tuples on the basis of some attribute (generally key-
attribute).
3. Loading:
The third and final step of the ETL process is loading. In this step, the
transformed data is finally loaded into the data warehouse. Sometimes the
data is updated by loading into the data warehouse very frequently and
sometimes it is done after longer but regular intervals. The rate and period
of loading solely depends on the requirements and varies from system to
system.
ETL process can also use the pipelining concept i.e. as soon as some data is
extracted, it can transformed and during that period some new data can be
extracted. And while the transformed data is being loaded into the data
warehouse, the already extracted data can be transformed. The block diagram of
the pipelining of ETL process is shown below:

ETL Tools: Most commonly used ETL tools are Hevo, Sybase, Oracle Warehouse
builder, CloverETL, and MarkLogic.
Data Warehouses: Most commonly used Data Warehouses are Snowflake,
Redshift, BigQuery, and Firebolt.
ADVANTAGES OR DISADVANTAGES:
Advantages of ETL process in data warehousing:
1. Improved data quality: ETL process ensures that the data in the data
warehouse is accurate, complete, and up-to-date.
2. Better data integration: ETL process helps to integrate data from multiple
sources and systems, making it more accessible and usable.
3. Increased data security: ETL process can help to improve data security
by controlling access to the data warehouse and ensuring that only
authorized users can access the data.
4. Improved scalability: ETL process can help to improve scalability by
providing a way to manage and analyze large amounts of data.
5. Increased automation: ETL tools and technologies can automate and
simplify the ETL process, reducing the time and effort required to load and
update data in the warehouse.
Disadvantages of ETL process in data warehousing:
1. High cost: ETL process can be expensive to implement and maintain,
especially for organizations with limited resources.
2. Complexity: ETL process can be complex and difficult to implement,
especially for organizations that lack the necessary expertise or resources.
3. Limited flexibility: ETL process can be limited in terms of flexibility, as
it may not be able to handle unstructured data or real-time data streams.
4. Limited scalability: ETL process can be limited in terms of scalability, as
it may not be able to handle very large amounts of data.
5. Data privacy concerns: ETL process can raise concerns about data
privacy, as large amounts of data are collected, stored, and analyzed.
Overall, ETL process is an essential process in data warehousing that helps to
ensure that the data in the data warehouse is accurate, complete, and up-to-date.
However, it also comes with its own set of challenges and limitations, and
organizations need to carefully consider the costs and benefits before
implementing them.

ETL Vs ELT
In managing and analyzing data, there are two main approaches you might hear
about: Extraction, Load, and Transform (ELT) and Extraction, Transform, and
Load (ETL). These methods help move data from various sources into a data
warehouse, making it ready for analysis. Understanding the differences between
ELT and ETL can help you choose the right approach for your needs.
What is Extraction, Load, and Transform (ELT)?
ELT is a method where you first pull raw data from different sources, then put it
into a data warehouse, and finally transform it into the format needed for analysis.
Unlike ETL, where data is transformed before it gets stored, ELT keeps the data
in its original form until it’s needed
Brief History of ETL & ELT
ETL (Extraction, Transform, Load) was the traditional way of handling data. Over
time, with the increase in data volume and the need for real-time processing, ELT
was developed as a more flexible alternative. ELT builds on ETL but changes the
order of operations to better use modern data storage technologies.
ELT Process
Extraction, Load and Transform (ELT) is the technique of extracting raw data
from the source and storing it in data warehouse of the target server and preparing
it for endstream users.
ELT comprises of 3 different operations performed on the data –
1. Extract
Extracting data is the technique of identifying data from one or more
sources. The sources may be databases, files, ERP, CRM or any other
useful source of data.
2. Load
Loading is the process of storing the extracted raw data in data warehouse
or data lakes.
3. Transform
Data transformation is the process in which the raw data source is
transformed to the target format required for analysis.
Data from the sources are extracted and stored in the data warehouse. The entire
data is not transformed but only the required transformation is done when
necessary. Raw data can be retrieved from the warehouse anytime when required.
The data transformed as required is then sent forward for analysis. When you use
ELT, you move the entire data set as it exists in the source systems to the target.
This means that you have the raw data at your disposal in the data warehouse, in
contrast to the ETL approach.
ETL Process
ETL is the traditional technique of extracting raw data, transforming it for the
users as required and storing it in data warehouses. ELT was later developed,
having ETL as its base. The three operations happening in ETL and ELT are the
same except that their order of processing is slightly varied. This change in
sequence was made to overcome some drawbacks.
1. Extract
It is the process of extracting raw data from all available data sources such
as databases, files, ERP (Enterprise Resource Planning), CRM (Customer
Relationship Management) or any other.
2. Transform
The extracted data is immediately transformed as required by the user.
3. Load
The transformed data is then loaded into the data warehouse from where
the users can access it.
The data collected from the sources are directly stored in the staging area. The
transformations required are performed on the data in the staging area. Once the
data is transformed, the resultant data is stored in the data warehouse. The main
drawback of ETL architecture is that once the transformed data is stored in the
warehouse, it cannot be modified again whereas in ELT, a copy of the raw data is
always available in the warehouse and only the required data is transformed when
needed.
Difference between ELT and ETL
Here are the following difference between ETL and ELT:

ELT ETL

ETL tools require specific hardware

ELT tools do not require additional
with their own engines to perform
hardware
transformations

Mostly Hadoop or NoSQL database

RDBMS is used exclusively to store
to store data.Rarely RDBMS is
data
used

As all components are in one As ETL uses staging area, extra time is
system, loading is done only once required to load the data

The system has to wait for large sizes of

Time to transform data is
data. As the size of data increases,
independent of the size of data
transformation time also increases
ELT ETL

It is cost effective and available to Not cost effective for small and medium
all business using SaaS solution business

The data transformed is used by

The data transformed is used by users
data scientists and advanced
reading report and SQL coders
analysts

Views are created based on multiple

Creates ad hoc views.Low cost for
scripts.Deleting view means deleting
building and maintaining
data

Best for unstructured and non-

Best for relational and structured data.
relational data. Ideal for data lakes.
Better for small to medium amounts of
Suited for very large amounts of
data
data

Which is Better ETL or ELT?

The choice between ETL and ELT depends on your specific needs and
requirements –
• ETL works well for smaller datasets and structured data where you need
the data transformed right away. It often requires special hardware and can
be less flexible when handling large amounts of data.
• ELT is better for large datasets and unstructured or non-relational data. It’s
more flexible and cost-effective, especially with cloud-based data
solutions. With ELT, you can store raw data and transform it as needed.
Both ELT and ETL are important methods for managing and analyzing data. ETL
is traditional but has limitations, while ELT offers more flexibility and efficiency,
particularly for large and diverse datasets. Knowing how each method works can
help us to decide which is best for our data needs.
ELT is generally more cost-effective for small to medium-sized businesses,
especially when using cloud-based solutions, because it doesn’t need specific
hardware.
ELT can handle both structured and unstructured data. It’s especially useful for
large volumes of data, whether it’s structured or not.
In ELT, transformation happens after the data is loaded into the warehouse,
allowing for more flexibility. In ETL, data is transformed before it reaches the
warehouse.
ELT often uses modern storage systems like Hadoop or NoSQL databases, but it
can also work with traditional RDBMS systems.

Types of Data warehouses

There are three main types of data warehouse.
Enterprise Data Warehouse (EDW)
This type of warehouse serves as a key or central database that facilitates
decision-support services throughout the enterprise. The advantage to this type of
warehouse is that it provides access to cross-organizational information, offers a
unified approach to data representation, and allows running complex queries.
An Enterprise Data Warehouse (EDW) is a centralized repository that stores
and manages an organization's historical data from various operational
systems. This data is integrated, transformed, and made accessible for analysis
and reporting across the entire enterprise.
Key Characteristics of an EDW
• Centralized Repository: All data is stored in a single location.
• Integrated Data: Data from multiple sources is combined into a consistent
format.
• Historical Data: Focuses on storing past data for analysis and trend
identification.
• Subject-Oriented: Data is organized around subjects like customers,
products, or sales.
• Time-Variant: Data is recorded with a time dimension to track changes
over time.
• Non-Volatile: Data is not overwritten or deleted, allowing for historical
analysis.
Components of an EDW
• Data Sources: Operational systems like ERP, CRM, HR, and others.
• Data Extraction, Transformation, and Loading (ETL): Processes data
from sources into the EDW.
• Metadata: Information about the data, such as its structure, meaning, and
quality.
• Data Warehouse: The central repository for storing and managing data.
• Data Mart: Smaller subsets of the EDW focused on specific business
units.
• Business Intelligence (BI) Tools: Software for querying, analyzing, and
visualizing data.

Benefits of an EDW
• Improved Decision Making: Provides a comprehensive view of the
business for informed decisions.
• Enhanced Customer Satisfaction: Enables better understanding of
customer behavior and preferences.
• Increased Operational Efficiency: Identifies opportunities for process
optimization.
• Competitive Advantage: Uncovers market trends and opportunities.
• Cost Reduction: Optimizes resource allocation and reduces operational
costs.
Challenges of Implementing an EDW
• High Cost: Requires significant investment in hardware, software, and
personnel.
• Complexity: Integrating and transforming data from multiple sources can
be challenging.
• Data Quality: Ensuring data accuracy and consistency is crucial.
• Performance: Handling large volumes of data requires efficient query
processing.
Modern EDW Trends
• Cloud-Based EDWs: Leveraging cloud platforms for scalability and cost-
efficiency.
• Data Lakes: Combining structured and unstructured data for advanced
analytics.
• Data Governance: Implementing robust data management policies and
procedures.
• Self-Service BI: Empowering business users with data access and analysis
capabilities.

Operational Data Store (ODS)

This type of data warehouse refreshes in real-time. It is often preferred for routine
activities like storing employee records. It is required when data warehouse
systems do not support reporting needs of the business.
An Operational Data Store (ODS) is a central database that provides a snapshot
of the latest data from multiple transactional systems for operational reporting.
It's essentially a staging area between operational systems and the data
warehouse.
Key Characteristics of an ODS
• Real-time or near-real-time updates: Data is refreshed frequently to
reflect current business operations.
• Detailed data: Contains granular, operational data, often in its original
format.
• Short data retention: Typically stores data for a limited period (days or
weeks).
• Subject-oriented: Data is organized around business subjects or
processes.
• Integrated: Combines data from various operational systems.
• Volatile: Data is constantly changing to reflect current operations.
Purpose of an ODS
• Operational reporting: Provides data for day-to-day operational
decision-making.
• Data cleansing and transformation: Prepares data for loading into the
data warehouse.
• Data quality checks: Ensures data integrity before it's moved to the data
warehouse.
• Short-term analysis: Supports ad-hoc analysis on current data.
Comparison to Data Warehouse
While both ODS and data warehouse store integrated data, they serve different
purposes.
Operational Data Store
Feature Data Warehouse (DW)
(ODS)

Data freshness Real-time or near-real-time Historical

Data
Detailed Summarized
granularity

Data volume Smaller Larger

Data retention Short-term Long-term

Operational reporting, data Analytical reporting, decision

Purpose
staging support

Benefits of an ODS
• Improved operational efficiency: Provides timely data for operational
decisions.
• Enhanced data quality: Cleanses and transforms data before loading into
the data warehouse.
• Faster reporting: Reduces query response time for operational reports.
• Reduced load on operational systems: Offloads reporting workload from
transactional systems.

Data Mart
A data mart is a subset of a data warehouse built to maintain a particular
department, region, or business unit. Every department of a business has a central
repository or data mart to store data. The data from the data mart is stored in the
ODS periodically. The ODS then sends the data to the EDW, where it is stored
and used.
A data mart is a subset of a data warehouse focused on a specific line of
business, department, or subject area. It contains a selected portion of data
from the larger data warehouse, tailored to the specific needs of a particular group
of users.

Key Characteristics of a Data Mart

• Focused: Contains data relevant to a specific business unit or department.
• Smaller scope: Less complex and easier to manage than a full data
warehouse.
• Faster development: Can be implemented more quickly than a data
warehouse.
• Dependent on data warehouse: Often extracts data from the data
warehouse.
• Improved query performance: Optimized for the specific needs of the
target users.
Benefits of a Data Mart
• Faster time-to-market: Enables quicker access to insights for specific
business units.
• Improved decision making: Provides relevant data for targeted analysis.
• Increased user satisfaction: Delivers information tailored to specific user
needs.
• Lower cost: Typically less expensive to implement and maintain than a
data warehouse.
Types of Data Marts
• Dependent data mart: Extracts data from a central data warehouse.
• Independent data mart: Builds its own data warehouse from multiple
sources.
• Hybrid data mart: Combines elements of both dependent and
independent data marts.
Example
A retail company might have separate data marts for sales, marketing, and
finance. The sales data mart would contain data on customers, products, sales
transactions, and promotions, while the marketing data mart would focus on
customer demographics, campaign performance, and marketing ROI.
Data warehouse Design and Modeling
Data Warehouse Design
A data warehouse is a single data repository where a record from multiple data
sources is integrated for online business analytical processing (OLAP). This
implies a data warehouse needs to meet the requirements from all the business
stages within the entire organization. Thus, data warehouse design is a hugely
complex, lengthy, and hence error-prone process. Furthermore, business
analytical functions change over time, which results in changes in the
requirements for the systems. Therefore, data warehouse and OLAP systems are
dynamic, and the design process is continuous.
Data warehouse design takes a method different from view materialization in the
industries. It sees data warehouses as database systems with particular needs such
as answering management related queries. The target of the design becomes how
the record from multiple data sources should be extracted, transformed, and
loaded (ETL) to be organized in a database as the data warehouse.
There are two approaches
1. top-down approach
2. bottom-up approach
Top-down Design Approach
In the "Top-Down" design approach, a data warehouse is described as a subject-
oriented, time-variant, non-volatile and integrated data repository for the entire
enterprise data from different sources are validated, reformatted and saved in a
normalized (up to 3NF) database as the data warehouse. The data warehouse
stores "atomic" information, the data at the lowest level of granularity, from
where dimensional data marts can be built by selecting the data required for
specific business subjects or particular departments. An approach is a data-driven
approach as the information is gathered and integrated first and then business
requirements by subjects for building data marts are formulated. The advantage
of this method is which it supports a single integrated data source. Thus data marts
built from it will have consistency when they overlap.
Advantages of top-down design
• Data Marts are loaded from the data warehouses.
• Developing new data mart from the data warehouse is very easy.
Disadvantages of top-down design
• This technique is inflexible to changing departmental needs.
• The cost of implementing the project is high.
Bottom-Up Design Approach
In the "Bottom-Up" approach, a data warehouse is described as "a copy of
transaction data specifical architecture for query and analysis," term the star
schema. In this approach, a data mart is created first to necessary reporting and
analytical capabilities for particular business processes (or subjects). Thus it is
needed to be a business-driven approach in contrast to Inmon's data-driven
approach.
Data marts include the lowest grain data and, if needed, aggregated data too.
Instead of a normalized database for the data warehouse, a denormalized
dimensional database is adapted to meet the data delivery requirements of data
warehouses. Using this method, to use the set of data marts as the enterprise data
warehouse, data marts should be built with conformed dimensions in mind,
defining that ordinary objects are represented the same in different data marts.
The conformed dimensions connected the data marts to form a data warehouse,
which is generally called a virtual data warehouse.
The advantage of the "bottom-up" design approach is that it has quick ROI, as
developing a data mart, a data warehouse for a single subject, takes far less time
and effort than developing an enterprise-wide data warehouse. Also, the risk of
failure is even less. This method is inherently incremental. This method allows
the project team to learn and grow.

Advantages of bottom-up design

• Documents can be generated quickly.
• The data warehouse can be extended to accommodate new business units.
• It is just developing new data marts and then integrating with other data
marts.
Disadvantages of bottom-up design
• The locations of the data warehouse and the data marts are reversed in the
bottom-up approach design.
Difference between Top-Down Design Approach and Bottom-Up Design
Approach

Top-Down Design Approach Bottom-Up Design Approach

Breaks the vast problem into smaller Solves the essential low-level problem
subproblems. and integrates them into a higher one.

Inherently architected- not a union of Inherently incremental; can schedule

several data marts. essential data marts first.

Single, central storage of information Departmental information stored.

about the content.

Centralized rules and control. Departmental rules and control.

It includes redundant information. Redundancy can be removed.

It may see quick results if Less risk of failure, favourable return

implemented with repetitions. on investment, and proof of techniques.

Data Warehouse Modeling

Data warehouse modeling is the process of designing the schemas of the detailed
and summarized information of the data warehouse. The goal of data warehouse
modeling is to develop a schema describing the reality, or at least a part of the
fact, which the data warehouse is needed to support.
Data warehouse modeling is an essential stage of building a data warehouse for
two main reasons. Firstly, through the schema, data warehouse clients can
visualize the relationships among the warehouse data, to use them with greater
ease. Secondly, a well-designed schema allows an effective data warehouse
structure to emerge, to help decrease the cost of implementing the warehouse and
improve the efficiency of using it.
Data modeling in data warehouses is different from data modeling in operational
database systems. The primary function of data warehouses is to support DSS
processes. Thus, the objective of data warehouse modeling is to make the data
warehouse efficiently support complex queries on long term information.
In contrast, data modeling in operational database systems targets efficiently
supporting simple transactions in the database such as retrieving, inserting,
deleting, and changing data. Moreover, data warehouses are designed for the
customer with general information knowledge about the enterprise, whereas
operational database systems are more oriented toward use by software specialists
for creating distinct applications.
Data Warehouse model is illustrated in the given diagram.

The data within the specific warehouse itself has a particular architecture with the
emphasis on various levels of summarization, as shown in figure:
The current detail record is central in importance as it:
o Reflects the most current happenings, which are commonly the most
stimulating.
o It is numerous as it is saved at the lowest method of the Granularity.
o It is always (almost) saved on disk storage, which is fast to access but
expensive and difficult to manage.
o Older detail data is stored in some form of mass storage, and it is infrequently
accessed and kept at a level detail consistent with current detailed data.
o Lightly summarized data is data extract from the low level of detail found at
the current, detailed level and usually is stored on disk storage. When building
the data warehouse have to remember what unit of time is summarization
done over and also the components or what attributes the summarized data
will contain.
o Highly summarized data is compact and directly available and can even be
found outside the warehouse.
Metadata is the final element of the data warehouses and is really of various
dimensions in which it is not the same as file drawn from the operational data,
but it is used as:-
o A directory to help the DSS investigator locate the items of the data
warehouse.
o A guide to the mapping of record as the data is changed from the
operational data to the data warehouse environment.
o A guide to the method used for summarization between the current,
accurate data and the lightly summarized information and the highly
summarized data, etc.
Data Modeling Life Cycle
In this section, we define a data modeling life cycle. It is a straight forward
process of transforming the business requirements to fulfill the goals for storing,
maintaining, and accessing the data within IT systems. The result is a logical and
physical data model for an enterprise data warehouse.
The objective of the data modeling life cycle is primarily the creation of a storage
area for business information. That area comes from the logical and physical data
modeling stages, as shown in Figure:

Conceptual Data Model

A conceptual data model recognizes the highest-level relationships between the
different entities.
Characteristics of the conceptual data model
o It contains the essential entities and the relationships among them.
o No attribute is specified.
o No primary key is specified.
We can see that the only data shown via the conceptual data model is the entities
that define the data and the relationships between those entities. No other data, as
shown through the conceptual data model.

Logical Data Model

A logical data model defines the information in as much structure as possible,
without observing how they will be physically achieved in the database. The
primary objective of logical data modeling is to document the business data
structures, processes, rules, and relationships by a single view - the logical data
model.
Features of a logical data model
o It involves all entities and relationships among them.
o All attributes for each entity are specified.
o The primary key for each entity is stated.
o Referential Integrity is specified (FK Relation).
The phase for designing the logical data model which are as follows:
o Specify primary keys for all entities.
o List the relationships between different entities.
o List all attributes for each entity.
o Normalization.
o No data types are listed

Physical Data Model

Physical data model describes how the model will be presented in the database.
A physical database model demonstrates all table structures, column names, data
types, constraints, primary key, foreign key, and relationships between tables. The
purpose of physical data modeling is the mapping of the logical data model to the
physical structures of the RDBMS system hosting the data warehouse. This
contains defining physical RDBMS structures, such as tables and data types to
use when storing the information. It may also include the definition of new data
structures for enhancing query performance.
Characteristics of a physical data model
o Specification all tables and columns.
o Foreign keys are used to recognize relationships between tables.
The steps for physical data model design which are as follows:
o Convert entities to tables.
o Convert relationships to foreign keys.
o Convert attributes to columns.
Types of Data Warehouse Models

Enterprise Warehouse
An Enterprise warehouse collects all of the records about subjects spanning the
entire organization. It supports corporate-wide data integration, usually from one
or more operational systems or external data providers, and it's cross-functional
in scope. It generally contains detailed information as well as summarized
information and can range in estimate from a few gigabyte to hundreds of
gigabytes, terabytes, or beyond.
An enterprise data warehouse may be accomplished on traditional mainframes,
UNIX super servers, or parallel architecture platforms. It required extensive
business modeling and may take years to develop and build.
Data Mart
A data mart includes a subset of corporate-wide data that is of value to a specific
collection of users. The scope is confined to particular selected subjects. For
example, a marketing data mart may restrict its subjects to the customer, items,
and sales. The data contained in the data marts tend to be summarized.
Data Marts is divided into two parts:
Independent Data Mart: Independent data mart is sourced from data captured
from one or more operational systems or external data providers, or data generally
locally within a different department or geographic area.
Dependent Data Mart: Dependent data marts are sourced exactly from
enterprise data-warehouses.
Virtual Warehouses
Virtual Data Warehouses is a set of perception over the operational database. For
effective query processing, only some of the possible summary vision may be
materialized. A virtual warehouse is simple to build but required excess capacity
on operational database servers.
Data Warehouse Delivery Process
Now we discuss the delivery process of the data warehouse. Main steps used in
data warehouse delivery process which are as follows:
IT Strategy: DWH project must contain IT strategy for procuring and retaining
funding.
Business Case Analysis: After the IT strategy has been designed, the next step is
the business case. It is essential to understand the level of investment that can be
justified and to recognize the projected business benefits which should be derived
from using the data warehouse.
Education & Prototyping: Company will experiment with the ideas of data
analysis and educate themselves on the value of the data warehouse. This is
valuable and should be required if this is the company first exposure to the
benefits of the DS record. Prototyping method can progress the growth of
education. It is better than working models. Prototyping requires business
requirement, technical blueprint, and structures.
Business Requirement: It contains such as
• The logical model for data within the data warehouse.
• The source system that provides this data (mapping rules)
• The business rules to be applied to information.
• The query profiles for the immediate requirement
Technical blueprint: It arranges the architecture of the warehouse. Technical
blueprint of the delivery process makes an architecture plan which satisfies long-
term requirements. It lays server and data mart architecture and essential
components of database design.
Building the vision: It is the phase where the first production deliverable is
produced. This stage will probably create significant infrastructure elements for
extracting and loading information but limit them to the extraction and load of
information sources.
History Load: The next step is one where the remainder of the required history
is loaded into the data warehouse. This means that the new entities would not be
added to the data warehouse, but additional physical tables would probably be
created to save the increased record volumes.
AD-Hoc Query: In this step, we configure an ad-hoc query tool to operate
against the data warehouse.
These end-customer access tools are capable of automatically generating the
database query that answers any question posed by the user.
Automation: The automation phase is where many of the operational
management processes are fully automated within the DWH. These would
include:
• Extracting & loading the data from a variety of sources systems
• Transforming the information into a form suitable for analysis
• Backing up, restoring & archiving data
• Generating aggregations from predefined definitions within the Data
Warehouse.
• Monitoring query profiles & determining the appropriate aggregates to
maintain system performance.
Extending Scope: In this phase, the scope of DWH is extended to address a new
set of business requirements. This involves the loading of additional data sources
into the DWH i.e. the introduction of new data marts.
Requirement Evolution: This is the last step of the delivery process of a data
warehouse. As we all know that requirements are not static and evolve
continuously. As the business requirements will change it supports to be reflected
in the system.
Concept Hierarchy
Concept hierarchy is directed acyclic graph of ideas, where a unique name
identifies each of the theories.
An arc from the concept a to b denotes which is a more general concept than b.
We can tag the text with ideas.
Each text report is tagged by a set of concepts which corresponds to its content.
Tagging a report with a concept implicitly entails its tagging with all the ancestors
of the concept hierarchy. It is, therefore desired that a report should be tagged
with the lowest concept possible.
The method to automatically tag the report to the hierarchy is a top-down
approach. An evaluation function determines whether a record currently tagged
to a node can also be tagged to any of its child nodes.
If so, then then the tag moves down the hierarchy till it cannot be pushed any
further.
The outcome of this step is a hierarchy of report and, at each node, there is a set
of the report having a common concept related to the node.
The hierarchy of reports resulting from the tagging step is useful for many texts
mining process.
It is assumed that the hierarchy of concepts is called a priori. We can even have
such a hierarchy of documents without a concept hierarchy, by using any
hierarchical clustering algorithm, which results in such a hierarchy.
Concept hierarchy defines a sequence of mapping from a set of particular, low-
level concepts to more general, higher-level concepts.
In a data warehouse, it is usually used to express different levels of granularity of
an attribute from one of the dimension tables.
Concept hierarchies are crucial for the formulation of useful OLAP queries. The
hierarchies allow the user to summarize the data at various levels.
For example, using the location hierarchy, the user can retrieve data which
summarizes sales for each location, for all the areas in a given state, or even a
given country without the necessity of reorganizing the data.
Online Analytical Processing (OLAP)
OLAP stands for On-Line Analytical Processing. OLAP is a classification of
software technology which authorizes analysts, managers, and executives to gain
insight into information through fast, consistent, interactive access in a wide
variety of possible views of data that has been transformed from raw information
to reflect the real dimensionality of the enterprise as understood by the clients.
OLAP implement the multidimensional analysis of business information and
support the capability for complex estimations, trend analysis, and sophisticated
data modeling. It is rapidly enhancing the essential foundation for Intelligent
Solutions containing Business Performance Management, Planning, Budgeting,
Forecasting, Financial Documenting, Analysis, Simulation-Models, Knowledge
Discovery, and Data Warehouses Reporting. OLAP enables end-clients to
perform ad hoc analysis of record in multiple dimensions, providing the insight
and understanding they require for better decision making.
OLAP applications are used by a variety of the functions of an organization.
Finance and accounting:
o Budgeting
o Activity-based costing
o Financial performance analysis
o And financial modeling
Sales and Marketing
o Sales analysis and forecasting
o Market research analysis
o Promotion analysis
o Customer analysis
o Market and customer segmentation
Production
o Production planning
o Defect analysis
OLAP cubes have two main purposes. The first is to provide business users with
a data model more intuitive to them than a tabular model. This model is called a
Dimensional Model.
The second purpose is to enable fast query response that is usually difficult to
achieve using tabular models.
How OLAP Works?
Fundamentally, OLAP has a very simple concept. It pre-calculates most of the
queries that are typically very hard to execute over tabular databases, namely
aggregation, joining, and grouping. These queries are calculated during a process
that is usually called 'building' or 'processing' of the OLAP cube. This process
happens overnight, and by the time end users get to work - data will have been
updated.
OLAP Guidelines (Dr.E.F.Codd Rule)
Dr E.F. Codd, the "father" of the relational model, has formulated a list of 12
guidelines and requirements as the basis for selecting OLAP systems:
1) Multidimensional Conceptual View: This is the central features of an OLAP
system. By needing a multidimensional view, it is possible to carry out methods
like slice and dice.
2) Transparency: Make the technology, underlying information repository,
computing operations, and the dissimilar nature of source data totally transparent
to users. Such transparency helps to improve the efficiency and productivity of
the users.
3) Accessibility: It provides access only to the data that is actually required to
perform the particular analysis, present a single, coherent, and consistent view to
the clients. The OLAP system must map its own logical schema to the
heterogeneous physical data stores and perform any necessary transformations.
The OLAP operations should be sitting between data sources (e.g., data
warehouses) and an OLAP front-end.
4) Consistent Reporting Performance: To make sure that the users do not feel
any significant degradation in documenting performance as the number of
dimensions or the size of the database increases. That is, the performance of
OLAP should not suffer as the number of dimensions is increased. Users must
observe consistent run time, response time, or machine utilization every time a
given query is run.
5) Client/Server Architecture: Make the server component of OLAP tools
sufficiently intelligent that the various clients to be attached with a minimum of
effort and integration programming. The server should be capable of mapping
and consolidating data between dissimilar databases.
6) Generic Dimensionality: An OLAP method should treat each dimension as
equivalent in both is structure and operational capabilities. Additional operational
capabilities may be allowed to selected dimensions, but such additional tasks
should be grantable to any dimension.
7) Dynamic Sparse Matrix Handling: To adapt the physical schema to the
specific analytical model being created and loaded that optimizes sparse matrix
handling. When encountering the sparse matrix, the system must be easy to
dynamically assume the distribution of the information and adjust the storage and
access to obtain and maintain a consistent level of performance.
8) Multiuser Support: OLAP tools must provide concurrent data access, data
integrity, and access security.
9) Unrestricted cross-dimensional Operations: It provides the ability for the
methods to identify dimensional order and necessarily functions roll-up and drill-
down methods within a dimension or across the dimension.
10)Intuitive Data Manipulation: Data Manipulation fundamental the
consolidation direction like as reorientation (pivoting), drill-down and roll-up,
and another manipulation to be accomplished naturally and precisely via point-
and-click and drag and drop methods on the cells of the scientific model. It avoids
the use of a menu or multiple trips to a user interface.
11)Flexible Reporting: It implements efficiency to the business clients to
organize columns, rows, and cells in a manner that facilitates simple
manipulation, analysis, and synthesis of data.
12)Unlimited Dimensions and Aggregation Levels: The number of data
dimensions should be unlimited. Each of these common dimensions must allow
a practically unlimited number of customer-defined aggregation levels within any
given consolidation path.
Characteristics of OLAP
In the FASMI characteristics of OLAP methods, the term derived from the first
letters of the characteristics are:

Fast
It defines which the system targeted to deliver the most feedback to the client
within about five seconds, with the elementary analysis taking no more than one
second and very few taking more than 20 seconds.
Analysis
It defines which the method can cope with any business logic and statistical
analysis that is relevant for the function and the user, keep it easy enough for the
target client. Although some preprogramming may be needed we do not think it
acceptable if all application definitions have to be allow the user to define new
Adhoc calculations as part of the analysis and to document on the data in any
desired method, without having to program so we excludes products (like Oracle
Discoverer) that do not allow the user to define new Adhoc calculation as part of
the analysis and to document on the data in any desired product that do not allow
adequate end user-oriented calculation flexibility.
Share
It defines which the system tools all the security requirements for understanding
and, if multiple write connection is needed, concurrent update location at an
appropriated level, not all functions need customer to write data back, but for the
increasing number which does, the system should be able to manage multiple
updates in a timely, secure manner.
Multidimensional
This is the basic requirement. OLAP system must provide a multidimensional
conceptual view of the data, including full support for hierarchies, as this is
certainly the most logical method to analyze business and organizations.
Information
The system should be able to hold all the data needed by the applications. Data
sparsity should be handled in an efficient manner.
The main characteristics of OLAP are as follows:
1. Multidimensional conceptual view: OLAP systems let business users
have a dimensional and logical view of the data in the data warehouse. It
helps in carrying slice and dice operations.
2. Multi-User Support: Since the OLAP techniques are shared, the OLAP
operation should provide normal database operations, containing retrieval,
update, adequacy control, integrity, and security.
3. Accessibility: OLAP acts as a mediator between data warehouses and
front-end. The OLAP operations should be sitting between data sources
(e.g., data warehouses) and an OLAP front-end.
4. Storing OLAP results: OLAP results are kept separate from data sources.
5. Uniform documenting performance: Increasing the number of
dimensions or database size should not significantly degrade the reporting
performance of the OLAP system.
6. OLAP provides for distinguishing between zero values and missing values
so that aggregates are computed correctly.
7. OLAP system should ignore all missing values and compute correct
aggregate values.
8. OLAP facilitate interactive query and complex analysis for the users.
9. OLAP allows users to drill down for greater details or roll up for
aggregations of metrics along a single business dimension or across
multiple dimension.
10.OLAP provides the ability to perform intricate calculations and
comparisons.
11.OLAP presents results in a number of meaningful ways, including charts
and graphs.
Benefits of OLAP
OLAP holds several benefits for businesses: -
1. OLAP helps managers in decision-making through the multidimensional
record views that it is efficient in providing, thus increasing their
productivity.
2. OLAP functions are self-sufficient owing to the inherent flexibility support
to the organized databases.
3. It facilitates simulation of business models and problems, through
extensive management of analysis-capabilities.
4. In conjunction with data warehouse, OLAP can be used to support a
reduction in the application backlog, faster data retrieval, and reduction in
query drag.

Online Transaction Processing (OLTP) Vs OLAP

OLTP (On-Line Transaction Processing) is featured by a large number of short
on-line transactions (INSERT, UPDATE, and DELETE). The primary
significance of OLTP operations is put on very rapid query processing,
maintaining record integrity in multi-access environments, and effectiveness
consistent by the number of transactions per second. In the OLTP database, there
is an accurate and current record, and schema used to save transactional database
is the entity model (usually 3NF).
OLAP (On-line Analytical Processing) is represented by a relatively low
volume of transactions. Queries are very difficult and involve aggregations. For
OLAP operations, response time is an effectiveness measure. OLAP applications
are generally used by Data Mining techniques. In OLAP database there is
aggregated, historical information, stored in multi-dimensional schemas
(generally star schema).
Following are the difference between OLAP and OLTP system.
1) Users: OLTP systems are designed for office worker while the OLAP systems
are designed for decision-makers. Therefore while an OLTP method may be
accessed by hundreds or even thousands of clients in a huge enterprise, an OLAP
system is suitable to be accessed only by a select class of manager and may be
used only by dozens of users.
2) Functions: OLTP systems are mission-critical. They provide day-to-day
operations of an enterprise and are largely performance and availability driven.
These operations carry out simple repetitive operations. OLAP systems are
management-critical to support the decision of enterprise support tasks using
detailed investigation.
3) Nature: Although SQL queries return a set of data, OLTP methods are
designed to step one record at the time, for example, a data related to the user
who may be on the phone or in the store. OLAP system is not designed to deal
with individual customer records. Instead, they include queries that deal with
many data at a time and provide summary or aggregate information to a manager.
OLAP applications include data stored in a data warehouses that have been
extracted from many tables and possibly from more than one enterprise database.
4) Design: OLTP database operations are designed to be application-oriented
while OLAP operations are designed to be subject-oriented. OLTP systems view
the enterprise record as a collection of tables (possibly based on an entity-
relationship model). OLAP operations view enterprise information as
multidimensional).
5) Data: OLTP systems usually deal only with the current status of data. For
example, a record about an employee who left three years ago may not be feasible
on the Human Resources System. The old data may have been achieved on some
type of stable storage media and may not be accessible online. On the other hand,
OLAP systems needed historical data over several years since trends are often
essential in decision making.
6) Kind of use: OLTP methods are used for reading and writing operations while
OLAP methods usually do not update the data.
7) View: An OLTP system focuses primarily on the current data within an
enterprise or department, which does not refer to historical data or data in various
organizations. In contrast, an OLAP system spans multiple version of a database
schema, due to the evolutionary process of an organization. OLAP system also
deals with information that originates from different organizations, integrating
information from many data stores. Because of their huge volume, these are
stored on multiple storage media.
8) Access Patterns: The access pattern of an OLTP system consist primarily of
short, atomic transactions. Such a system needed concurrency control and
recovery techniques. However, access to OLAP systems is mostly read-only
operations because these data warehouses store historical information.
The biggest difference between an OLTP and OLAP system is the amount of data
analyzed in a single transaction. Whereas an OLTP handles many concurrent
customers and queries touching only a single data or limited collection of records
at a time, an OLAP system must have the efficiency to operate on millions of data
to answer a single query.

OLAP operations
In the multidimensional model, the records are organized into various
dimensions, and each dimension includes multiple levels of abstraction described
by concept hierarchies. This organization support users with the flexibility to
view data from various perspectives. A number of OLAP data cube operation
exist to demonstrate these different views, allowing interactive queries and search
of the record at hand. Hence, OLAP supports a user-friendly environment for
interactive data analysis.
Consider the OLAP operations which are to be performed on multidimensional
data. The figure shows data cubes for sales of a shop. The cube contains the
dimensions, location, and time and item, where the location is aggregated with
regard to city values, time is aggregated with respect to quarters, and an item is
aggregated with respect to item types.
Roll-Up
The roll-up operation (also known as drill-up or aggregation
operation) performs aggregation on a data cube, by climbing down concept
hierarchies, i.e., dimension reduction. Roll-up is like zooming-out on the data
cubes. Figure shows the result of roll-up operations performed on the dimension
location. The hierarchy for the location is defined as the Order Street, city,
province, or state, country. The roll-up operation aggregates the data by ascending
the location hierarchy from the level of the city to the level of the country.
When a roll-up is performed by dimensions reduction, one or more dimensions
are removed from the cube. For example, consider a sales data cube having two
dimensions, location and time. Roll-up may be performed by removing, the time
dimensions, appearing in an aggregation of the total sales by location, relatively
than by location and by time.
Example
Consider the following cubes illustrating temperature of certain days
recorded weekly:

Temperature 64 65 68 69 70 71 72 75 80 81 83 85

Week1 1 0 1 0 1 0 0 0 0 0 1 0

Week2 0 0 0 1 0 0 1 2 0 1 0 0

Consider that we want to set up levels (hot (80-85), mild (70-75), cool (64-69))
in temperature from the above cubes.
To do this, we have to group column and add up the value according to the concept
hierarchies. This operation is known as a roll-up.
By doing this, we contain the following cube:
Temperature cool mild

Week1 2 1

Week2 2 1

The roll-up operation groups the information by levels of temperature.

The following diagram illustrates how roll-up works.

Drill-Down
The drill-down operation (also called roll-down) is the reverse operation of roll-
up. Drill-down is like zooming-in on the data cube. It navigates from less
detailed record to more detailed data. Drill-down can be performed by
either stepping down a concept hierarchy for a dimension or adding additional
dimensions.
Figure shows a drill-down operation performed on the dimension time by
stepping down a concept hierarchy which is defined as day, month, quarter, and
year. Drill-down appears by descending the time hierarchy from the level of the
quarter to a more detailed level of the month.
Because a drill-down adds more details to the given data, it can also be performed
by adding a new dimension to a cube. For example, a drill-down on the central
cubes of the figure can occur by introducing an additional dimension, such as a
customer group.
Example
Drill-down adds more details to the given data

Temperature Cool mild hot

Day 1 0 0 0

Day 2 0 0 0

Day 3 0 0 1

Day 4 0 1 0

Day 5 1 0 0

Day 6 0 0 0

Day 7 1 0 0

Day 8 0 0 0

Day 9 1 0 0

Day 10 0 1 0

Day 11 0 1 0
Day 12 0 1 0

Day 13 0 0 1

Day 14 0 0 0

The following diagram illustrates how Drill-down works.

Slice
A slice is a subset of the cubes corresponding to a single value for one or more
members of the dimension. For example, a slice operation is executed when the
customer wants a selection on one dimension of a three-dimensional cube
resulting in a two-dimensional site. So, the Slice operations perform a selection
on one dimension of the given cube, thus resulting in a subcube.
For example, if we make the selection, temperature=cool we will obtain the
following cube:
Temperature Cool

Day 1 0

Day 2 0

Day 3 0

Day 4 0

Day 5 1

Day 6 1

Day 7 1

Day 8 1

Day 9 1

Day 11 0

Day 12 0

Day 13 0

Day 14 0
The following diagram illustrates how Slice works.

Here Slice is functioning for the dimensions "time" using the criterion time =
"Q1".
It will form a new sub-cubes by selecting one or more dimensions.
Dice
The dice operation describes a subcube by operating a selection on two or more
dimension.
For example, Implement the selection (time = day 3 OR time = day 4) AND
(temperature = cool OR temperature = hot) to the original cubes we get the
following subcube (still two-dimensional)
Temperature Cool hot

Day 3 0 1

Day 4 0 0

Consider the following diagram, which shows the dice operations.

The dice operation on the cubes based on the following selection criteria involves
three dimensions.
o (location = "Toronto" or "Vancouver")
o (time = "Q1" or "Q2")
o (item =" Mobile" or "Modem")
Pivot
The pivot operation is also called a rotation. Pivot is a visualization operations
which rotates the data axes in view to provide an alternative presentation of the
data. It may contain swapping the rows and columns or moving one of the row-
dimensions into the column dimensions.
Consider the following diagram, which shows the pivot operation.

Other OLAP Operations

executes queries containing more than one fact table. The drill-through operations
make use of relational SQL facilitates to drill through the bottom level of a data
cubes down to its back-end relational tables.
Other OLAP operations may contain ranking the top-N or bottom-N elements in
lists, as well as calculate moving average, growth rates, and interests, internal
rates of returns, depreciation, currency conversions, and statistical tasks.
OLAP offers analytical modeling capabilities, containing a calculation engine for
determining ratios, variance, etc. and for computing measures across various
dimensions. It can generate summarization, aggregation, and hierarchies at each
granularity level and at every dimensions intersection. OLAP also provide
functional models for forecasting, trend analysis, and statistical analysis. In this
context, the OLAP engine is a powerful data analysis tool.

Types of OLAP
There are three main types of OLAP servers are as following:

ROLAP stands for Relational OLAP, an application based on relational DBMSs.

MOLAP stands for Multidimensional OLAP, an application based on
multidimensional DBMSs.
HOLAP stands for Hybrid OLAP, an application using both relational and
multidimensional techniques.
Relational OLAP (ROLAP) Server
These are intermediate servers which stand in between a relational back-end
server and user frontend tools.
They use a relational or extended-relational DBMS to save and handle warehouse
data, and OLAP middleware to provide missing pieces.
ROLAP servers contain optimization for each DBMS back end, implementation
of aggregation navigation logic, and additional tools and services.
ROLAP technology tends to have higher scalability than MOLAP technology.
ROLAP systems work primarily from the data that resides in a relational database,
where the base data and dimension tables are stored as relational tables. This
model permits the multidimensional analysis of data.
This technique relies on manipulating the data stored in the relational database to
give the presence of traditional OLAP's slicing and dicing functionality. In
essence, each method of slicing and dicing is equivalent to adding a "WHERE"
clause in the SQL statement.

Relational OLAP Architecture

ROLAP Architecture includes the following components
o Database server.
o ROLAP server.
o Front-end tool.

Relational OLAP (ROLAP) is the latest and fastest-growing OLAP technology

segment in the market. This method allows multiple multidimensional views of
two-dimensional relational tables to be created, avoiding structuring record
around the desired view.
Some products in this segment have supported reliable SQL engines to help the
complexity of multidimensional analysis. This includes creating multiple SQL
statements to handle user requests, being 'RDBMS' aware and also being capable
of generating the SQL statements based on the optimizer of the DBMS engine.
Advantages
Can handle large amounts of information: The data size limitation of ROLAP
technology is depends on the data size of the underlying RDBMS. So, ROLAP
itself does not restrict the data amount.
<="" strong="">RDBMS already comes with a lot of features. So ROLAP
technologies, (works on top of the RDBMS) can control these functionalities.
Disadvantages
Performance can be slow: Each ROLAP report is a SQL query (or multiple SQL
queries) in the relational database, the query time can be prolonged if the
underlying data size is large.
Limited by SQL functionalities: ROLAP technology relies on upon developing
SQL statements to query the relational database, and SQL statements do not suit
all needs.
Multidimensional OLAP (MOLAP) Server
A MOLAP system is based on a native logical model that directly supports
multidimensional data and operations. Data are stored physically into
multidimensional arrays, and positional techniques are used to access them.
One of the significant distinctions of MOLAP against a ROLAP is that data are
summarized and are stored in an optimized format in a multidimensional cube,
instead of in a relational database. In MOLAP model, data are structured into
proprietary formats by client's reporting requirements with the calculations pre-
generated on the cubes.
MOLAP Architecture
MOLAP Architecture includes the following components
o Database server.
o MOLAP server.
o Front-end tool.
MOLAP structure primarily reads the precompiled data. MOLAP structure has
limited capabilities to dynamically create aggregations or to evaluate results
which have not been pre-calculated and stored.
Applications requiring iterative and comprehensive time-series analysis of trends
are well suited for MOLAP technology (e.g., financial analysis and budgeting).
Examples include Arbor Software's Essbase. Oracle's Express Server, Pilot
Software's Lightship Server, Sniper's TM/1. Planning Science's Gentium and
Kenan Technology's Multiway.
Some of the problems faced by clients are related to maintaining support to
multiple subject areas in an RDBMS. Some vendors can solve these problems by
continuing access from MOLAP tools to detailed data in and RDBMS.
This can be very useful for organizations with performance-sensitive
multidimensional analysis requirements and that have built or are in the process
of building a data warehouse architecture that contains multiple subject areas.
An example would be the creation of sales data measured by several dimensions
(e.g., product and sales region) to be stored and maintained in a persistent
structure. This structure would be provided to reduce the application overhead of
performing calculations and building aggregation during initialization. These
structures can be automatically refreshed at predetermined intervals established
by an administrator.
Advantages
Excellent Performance: A MOLAP cube is built for fast information retrieval,
and is optimal for slicing and dicing operations.
Can perform complex calculations: All evaluation have been pre-generated
when the cube is created. Hence, complex calculations are not only possible, but
they return quickly.
Disadvantages
Limited in the amount of information it can handle: Because all calculations
are performed when the cube is built, it is not possible to contain a large amount
of data in the cube itself.
Requires additional investment: Cube technology is generally proprietary and
does not already exist in the organization. Therefore, to adopt MOLAP
technology, chances are other investments in human and capital resources are
needed.

Hybrid OLAP (HOLAP) Server

HOLAP incorporates the best features of MOLAP and ROLAP into a single
architecture. HOLAP systems save more substantial quantities of detailed data in
the relational tables while the aggregations are stored in the pre-calculated cubes.
HOLAP also can drill through from the cube down to the relational tables for
delineated data. The Microsoft SQL Server 2000 provides a hybrid OLAP
server.
Advantages of HOLAP
1. HOLAP provide benefits of both MOLAP and ROLAP.
2. It provides fast access at all levels of aggregation.
3. HOLAP balances the disk space requirement, as it only stores the aggregate
information on the OLAP server and the detail record remains in the
relational database. So no duplicate copy of the detail record is maintained.
Disadvantages of HOLAP
1. HOLAP architecture is very complicated because it supports both MOLAP
and ROLAP servers.
Other Types
There are also less popular types of OLAP styles upon which one could stumble
upon every so often. We have listed some of the less popular brands existing in
the OLAP industry.
Web-Enabled OLAP (WOLAP) Server
WOLAP pertains to OLAP application which is accessible via the web browser.
Unlike traditional client/server OLAP applications, WOLAP is considered to
have a three-tiered architecture which consists of three components: a client, a
middleware, and a database server.
Desktop OLAP (DOLAP) Server
DOLAP permits a user to download a section of the data from the database or
source, and work with that dataset locally, or on their desktop.
Mobile OLAP (MOLAP) Server
Mobile OLAP enables users to access and work on OLAP data and applications
remotely through the use of their mobile devices.
Spatial OLAP (SOLAP) Server
SOLAP includes the capabilities of both Geographic Information Systems (GIS)
and OLAP into a single user interface. It facilitates the management of both
spatial and non-spatial data.

ROLAP Vs MOLAP Vs HOLAP.

ROLAP MOLAP HOLAP

ROLAP stands for Relational MOLAP stands for HOLAP stands for
Online Analytical Multidimensional Online Hybrid Online Analytical
Processing. Analytical Processing. Processing.

The ROLAP storage mode The MOLAP storage mode The HOLAP storage
causes the aggregation of the principle the aggregations of mode connects attributes
division to be stored in the division and a copy of its of both MOLAP and
indexed views in the source information to be saved ROLAP. Like MOLAP,
relational database that was in a multidimensional HOLAP causes the
specified in the partition's operation in analysis services aggregation of the
data source. when the separation is division to be stored in a
processed. multidimensional
operation in an SQL
Server analysis services
instance.

ROLAP does not because a This MOLAP operation is HOLAP does not causes a
copy of the source highly optimize to maximize copy of the source
information to be stored in query performance. The information to be stored.
the Analysis services data storage area can be on the For queries that access
folders. Instead, when the computer where the partition is the only summary record
outcome cannot be derived described or on another in the aggregations of a
from the query cache, the computer running Analysis division, HOLAP is the
indexed views in the record services. Because a copy of the equivalent of MOLAP.
source are accessed to answer source information resides in
queries. the multidimensional
operation, queries can be
resolved without accessing the
partition's source record.

Query response is frequently Query response times can be Queries that access
slower with ROLAP storage reduced substantially by using source record for
than with the MOLAP or aggregations. The record in the example, if we want to
HOLAP storage mode. partition's MOLAP operation drill down to an atomic
Processing time is also is only as current as of the most cube cell for which there
frequently slower with recent processing of the is no aggregation
ROLAP. separation. information must retrieve
data from the relational
database and will not be
as fast as they would be if
the source information
were stored in the
MOLAP architecture.
Difference between ROLAP and MOLAP

ROLAP MOLAP

ROLAP stands for Relational Online MOLAP stands for

Analytical Processing. Multidimensional Online Analytical
Processing.

It usually used when data warehouse It used when data warehouse

contains relational data. contains relational as well as non-
relational data.

It contains Analytical server. It contains the MDDB server.

It creates a multidimensional view of It contains prefabricated data cubes.

data dynamically.

It is very easy to implement It is difficult to implement.

It has a high response time It has less response time due to

prefabricated cubes.

It requires less amount of memory. It requires a large amount of memory.

ETL Process in Data Warehouse
No ratings yet
ETL Process in Data Warehouse
3 pages
Chapter 4 (PRE 6)
No ratings yet
Chapter 4 (PRE 6)
4 pages
ETL Process and Data Warehouse Types
No ratings yet
ETL Process and Data Warehouse Types
75 pages
Understanding the ETL Process
No ratings yet
Understanding the ETL Process
3 pages
Understanding ETL and ELT Processes
No ratings yet
Understanding ETL and ELT Processes
27 pages
ETL Essentials for Businesses
No ratings yet
ETL Essentials for Businesses
5 pages
What Is ETL?
No ratings yet
What Is ETL?
6 pages
ETL Process: (Extract, Transform, and Load) Process
No ratings yet
ETL Process: (Extract, Transform, and Load) Process
21 pages
Processes of DW
No ratings yet
Processes of DW
10 pages
DW Unit 2
No ratings yet
DW Unit 2
54 pages
ETL vs ELT: Key Differences Explained
No ratings yet
ETL vs ELT: Key Differences Explained
7 pages
ETL (Extract, Transform and Load)
No ratings yet
ETL (Extract, Transform and Load)
9 pages
Lec 13-ETL
No ratings yet
Lec 13-ETL
18 pages
DW Lecture UNIT 2
No ratings yet
DW Lecture UNIT 2
40 pages
Unit 2
No ratings yet
Unit 2
54 pages
ETL (Extract, Transform, and Load
No ratings yet
ETL (Extract, Transform, and Load
17 pages
ETL Process: Extraction and Validation
No ratings yet
ETL Process: Extraction and Validation
7 pages
ETL Vs ELT
No ratings yet
ETL Vs ELT
13 pages
Extract, Transform, Load
No ratings yet
Extract, Transform, Load
16 pages
Intro To ETL
No ratings yet
Intro To ETL
43 pages
ETL Essentials for IT Managers
No ratings yet
ETL Essentials for IT Managers
3 pages
ETL
No ratings yet
ETL
3 pages
ETL Process Overview in Data Warehousing
No ratings yet
ETL Process Overview in Data Warehousing
6 pages
Understanding the ETL Process Explained
No ratings yet
Understanding the ETL Process Explained
70 pages
Overview of the ETL Process Steps
No ratings yet
Overview of the ETL Process Steps
11 pages
CMR Bda Etl Process
No ratings yet
CMR Bda Etl Process
11 pages
ETL Best Practices
No ratings yet
ETL Best Practices
21 pages
ETL Process: Validation Steps Explained
No ratings yet
ETL Process: Validation Steps Explained
8 pages
ETL in Data Warehouse
No ratings yet
ETL in Data Warehouse
1 page
ETL (Extract, Transform, and Load) Process What Is ETL?: Data Warehouse Technique Needs To Change With Business Changes
No ratings yet
ETL (Extract, Transform, and Load) Process What Is ETL?: Data Warehouse Technique Needs To Change With Business Changes
4 pages
ETL Concepts and BPMN in Data Warehousing
No ratings yet
ETL Concepts and BPMN in Data Warehousing
13 pages
Unit-2 Notes DW 2021
No ratings yet
Unit-2 Notes DW 2021
45 pages
What Is ETL
No ratings yet
What Is ETL
13 pages
Assignment On Chapter 8 Data Warehousing and Management
No ratings yet
Assignment On Chapter 8 Data Warehousing and Management
13 pages
ETL Process for Data Warehousing
No ratings yet
ETL Process for Data Warehousing
4 pages
ETL - Extract, Transform and Load: What Is A Data Warehouse?
No ratings yet
ETL - Extract, Transform and Load: What Is A Data Warehouse?
30 pages
Lab 03
No ratings yet
Lab 03
4 pages
ETL Process Overview for Data Warehousing
No ratings yet
ETL Process Overview for Data Warehousing
15 pages
ETL Concepts
100% (3)
ETL Concepts
56 pages
ETL Process in Data Analytics Overview
No ratings yet
ETL Process in Data Analytics Overview
6 pages
ETL Jobs
No ratings yet
ETL Jobs
15 pages
ETL (Extract, Transform, and Load) Process
No ratings yet
ETL (Extract, Transform, and Load) Process
8 pages
(IJCT-V2I5P3) Author :mr. Nilesh Mali, MR - SachinBojewar
No ratings yet
(IJCT-V2I5P3) Author :mr. Nilesh Mali, MR - SachinBojewar
8 pages
Evaluating Revizto ETL Tools
No ratings yet
Evaluating Revizto ETL Tools
11 pages
ETL Process in Data Integration Explained
No ratings yet
ETL Process in Data Integration Explained
6 pages
Data Pipeline Design for ETL/ELT
No ratings yet
Data Pipeline Design for ETL/ELT
36 pages
ETL Vs ELT
No ratings yet
ETL Vs ELT
7 pages
Break Down Data Silos With ETL and Unlock Trapped Data With ETL
No ratings yet
Break Down Data Silos With ETL and Unlock Trapped Data With ETL
25 pages
Data Integration and The Extraction, Transformation and Loading Processes
No ratings yet
Data Integration and The Extraction, Transformation and Loading Processes
5 pages
ETL Mastery for Data Professionals
100% (1)
ETL Mastery for Data Professionals
15 pages
U2 - Hub Spoke
No ratings yet
U2 - Hub Spoke
17 pages
ETL Vs ELT 1648483824
No ratings yet
ETL Vs ELT 1648483824
9 pages
Mastering ETL Workflows A Comprehensive Guide To Learn ETL Workflows (Hermans, Kris LTD, Cybellium) (Z-Library)
No ratings yet
Mastering ETL Workflows A Comprehensive Guide To Learn ETL Workflows (Hermans, Kris LTD, Cybellium) (Z-Library)
373 pages
ETL Overview: What It Is and Why It Matters
No ratings yet
ETL Overview: What It Is and Why It Matters
5 pages
Types and Sets of Data in Data Mining
No ratings yet
Types and Sets of Data in Data Mining
6 pages
ETL Testing Overview and Process
No ratings yet
ETL Testing Overview and Process
3 pages
2231938-ETL Introduction
No ratings yet
2231938-ETL Introduction
11 pages
Extraction: Data Sources
No ratings yet
Extraction: Data Sources
2 pages
DivyaSharvani Resume1
No ratings yet
DivyaSharvani Resume1
1 page
Assignment 1 Data Warehousinfg
No ratings yet
Assignment 1 Data Warehousinfg
5 pages
CST8276 Lab 9 Raman
No ratings yet
CST8276 Lab 9 Raman
11 pages
Big Data Analytics Sample Question Paper
No ratings yet
Big Data Analytics Sample Question Paper
38 pages
2nd PUC Computer Science Paper 2
No ratings yet
2nd PUC Computer Science Paper 2
2 pages
HONDA XR250R Owner's Manual 1999
No ratings yet
HONDA XR250R Owner's Manual 1999
112 pages
Database Management System Using LibreOffice Base
No ratings yet
Database Management System Using LibreOffice Base
168 pages
Design For Internship and Career
100% (1)
Design For Internship and Career
17 pages
Understanding JDBC in Java Development
No ratings yet
Understanding JDBC in Java Development
63 pages
Data Analytics Glossary Guide
No ratings yet
Data Analytics Glossary Guide
4 pages
Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
Slide 2 - Relational Database
No ratings yet
Slide 2 - Relational Database
14 pages
Syjc It Model Paper - 3 - With Ans
No ratings yet
Syjc It Model Paper - 3 - With Ans
4 pages
Set MySQL Root Password in WAMP
No ratings yet
Set MySQL Root Password in WAMP
6 pages
Apex Process Automation Exam Prep
100% (1)
Apex Process Automation Exam Prep
23 pages
Operator Data Sheet
83% (6)
Operator Data Sheet
3 pages
Database Data Models Overview
No ratings yet
Database Data Models Overview
9 pages
Unit 3 Query Languages
No ratings yet
Unit 3 Query Languages
78 pages
MVC Step by Step Tutorial
No ratings yet
MVC Step by Step Tutorial
115 pages
NLP Unit-Ii (Part-I)
No ratings yet
NLP Unit-Ii (Part-I)
19 pages
12_HY_CS_2023-24
No ratings yet
12_HY_CS_2023-24
9 pages
SQL Server Cheat Sheet: by Via
No ratings yet
SQL Server Cheat Sheet: by Via
1 page
ER Model
No ratings yet
ER Model
14 pages
ASP.NET MVC REST API with SQL Server
No ratings yet
ASP.NET MVC REST API with SQL Server
3 pages
7bcee3a Unit V Ado Net
No ratings yet
7bcee3a Unit V Ado Net
48 pages
LAB 4 - Conversion Function and Conditional Expression
No ratings yet
LAB 4 - Conversion Function and Conditional Expression
38 pages
Software Design Description
100% (1)
Software Design Description
26 pages
Localhost - 127.0.0.1 - Dbintellectlink - PhpMyAdmin 5.2.1
No ratings yet
Localhost - 127.0.0.1 - Dbintellectlink - PhpMyAdmin 5.2.1
3 pages
Kiran Voorukonda - SF CPQ
No ratings yet
Kiran Voorukonda - SF CPQ
9 pages
Cse L-3, T-1 (2015-2016)
No ratings yet
Cse L-3, T-1 (2015-2016)
23 pages

DW Unit II Notes

Uploaded by

DW Unit II Notes

Uploaded by

UNIT II

ETL AND OLAP TECHNOLOGY

ETL tools require specific hardware

Mostly Hadoop or NoSQL database

The system has to wait for large sizes of

The data transformed is used by

Views are created based on multiple

Best for unstructured and non-

Which is Better ETL or ELT?

Types of Data warehouses

Operational Data Store (ODS)

Data freshness Real-time or near-real-time Historical

Data volume Smaller Larger

Data retention Short-term Long-term

Operational reporting, data Analytical reporting, decision

Key Characteristics of a Data Mart

Advantages of bottom-up design

Top-Down Design Approach Bottom-Up Design Approach

Inherently architected- not a union of Inherently incremental; can schedule

Single, central storage of information Departmental information stored.

Centralized rules and control. Departmental rules and control.

It includes redundant information. Redundancy can be removed.

It may see quick results if Less risk of failure, favourable return

Data Warehouse Modeling

Conceptual Data Model

Logical Data Model

Physical Data Model

Online Transaction Processing (OLTP) Vs OLAP

The roll-up operation groups the information by levels of temperature.

Temperature Cool mild hot

The following diagram illustrates how Drill-down works.

Consider the following diagram, which shows the dice operations.

Other OLAP Operations

ROLAP stands for Relational OLAP, an application based on relational DBMSs.

Relational OLAP Architecture

Relational OLAP (ROLAP) is the latest and fastest-growing OLAP technology

Hybrid OLAP (HOLAP) Server

ROLAP Vs MOLAP Vs HOLAP.

ROLAP MOLAP HOLAP

ROLAP stands for Relational Online MOLAP stands for

It usually used when data warehouse It used when data warehouse

It contains Analytical server. It contains the MDDB server.

It creates a multidimensional view of It contains prefabricated data cubes.

It is very easy to implement It is difficult to implement.

It has a high response time It has less response time due to

It requires less amount of memory. It requires a large amount of memory.

You might also like