DW Unit II Notes
DW Unit II Notes
ETL Tools: Most commonly used ETL tools are Hevo, Sybase, Oracle Warehouse
builder, CloverETL, and MarkLogic.
Data Warehouses: Most commonly used Data Warehouses are Snowflake,
Redshift, BigQuery, and Firebolt.
ADVANTAGES OR DISADVANTAGES:
Advantages of ETL process in data warehousing:
1. Improved data quality: ETL process ensures that the data in the data
warehouse is accurate, complete, and up-to-date.
2. Better data integration: ETL process helps to integrate data from multiple
sources and systems, making it more accessible and usable.
3. Increased data security: ETL process can help to improve data security
by controlling access to the data warehouse and ensuring that only
authorized users can access the data.
4. Improved scalability: ETL process can help to improve scalability by
providing a way to manage and analyze large amounts of data.
5. Increased automation: ETL tools and technologies can automate and
simplify the ETL process, reducing the time and effort required to load and
update data in the warehouse.
Disadvantages of ETL process in data warehousing:
1. High cost: ETL process can be expensive to implement and maintain,
especially for organizations with limited resources.
2. Complexity: ETL process can be complex and difficult to implement,
especially for organizations that lack the necessary expertise or resources.
3. Limited flexibility: ETL process can be limited in terms of flexibility, as
it may not be able to handle unstructured data or real-time data streams.
4. Limited scalability: ETL process can be limited in terms of scalability, as
it may not be able to handle very large amounts of data.
5. Data privacy concerns: ETL process can raise concerns about data
privacy, as large amounts of data are collected, stored, and analyzed.
Overall, ETL process is an essential process in data warehousing that helps to
ensure that the data in the data warehouse is accurate, complete, and up-to-date.
However, it also comes with its own set of challenges and limitations, and
organizations need to carefully consider the costs and benefits before
implementing them.
ETL Vs ELT
In managing and analyzing data, there are two main approaches you might hear
about: Extraction, Load, and Transform (ELT) and Extraction, Transform, and
Load (ETL). These methods help move data from various sources into a data
warehouse, making it ready for analysis. Understanding the differences between
ELT and ETL can help you choose the right approach for your needs.
What is Extraction, Load, and Transform (ELT)?
ELT is a method where you first pull raw data from different sources, then put it
into a data warehouse, and finally transform it into the format needed for analysis.
Unlike ETL, where data is transformed before it gets stored, ELT keeps the data
in its original form until it’s needed
Brief History of ETL & ELT
ETL (Extraction, Transform, Load) was the traditional way of handling data. Over
time, with the increase in data volume and the need for real-time processing, ELT
was developed as a more flexible alternative. ELT builds on ETL but changes the
order of operations to better use modern data storage technologies.
ELT Process
Extraction, Load and Transform (ELT) is the technique of extracting raw data
from the source and storing it in data warehouse of the target server and preparing
it for endstream users.
ELT comprises of 3 different operations performed on the data –
1. Extract
Extracting data is the technique of identifying data from one or more
sources. The sources may be databases, files, ERP, CRM or any other
useful source of data.
2. Load
Loading is the process of storing the extracted raw data in data warehouse
or data lakes.
3. Transform
Data transformation is the process in which the raw data source is
transformed to the target format required for analysis.
Data from the sources are extracted and stored in the data warehouse. The entire
data is not transformed but only the required transformation is done when
necessary. Raw data can be retrieved from the warehouse anytime when required.
The data transformed as required is then sent forward for analysis. When you use
ELT, you move the entire data set as it exists in the source systems to the target.
This means that you have the raw data at your disposal in the data warehouse, in
contrast to the ETL approach.
ETL Process
ETL is the traditional technique of extracting raw data, transforming it for the
users as required and storing it in data warehouses. ELT was later developed,
having ETL as its base. The three operations happening in ETL and ELT are the
same except that their order of processing is slightly varied. This change in
sequence was made to overcome some drawbacks.
1. Extract
It is the process of extracting raw data from all available data sources such
as databases, files, ERP (Enterprise Resource Planning), CRM (Customer
Relationship Management) or any other.
2. Transform
The extracted data is immediately transformed as required by the user.
3. Load
The transformed data is then loaded into the data warehouse from where
the users can access it.
The data collected from the sources are directly stored in the staging area. The
transformations required are performed on the data in the staging area. Once the
data is transformed, the resultant data is stored in the data warehouse. The main
drawback of ETL architecture is that once the transformed data is stored in the
warehouse, it cannot be modified again whereas in ELT, a copy of the raw data is
always available in the warehouse and only the required data is transformed when
needed.
Difference between ELT and ETL
Here are the following difference between ETL and ELT:
ELT ETL
As all components are in one As ETL uses staging area, extra time is
system, loading is done only once required to load the data
It is cost effective and available to Not cost effective for small and medium
all business using SaaS solution business
Benefits of an EDW
• Improved Decision Making: Provides a comprehensive view of the
business for informed decisions.
• Enhanced Customer Satisfaction: Enables better understanding of
customer behavior and preferences.
• Increased Operational Efficiency: Identifies opportunities for process
optimization.
• Competitive Advantage: Uncovers market trends and opportunities.
• Cost Reduction: Optimizes resource allocation and reduces operational
costs.
Challenges of Implementing an EDW
• High Cost: Requires significant investment in hardware, software, and
personnel.
• Complexity: Integrating and transforming data from multiple sources can
be challenging.
• Data Quality: Ensuring data accuracy and consistency is crucial.
• Performance: Handling large volumes of data requires efficient query
processing.
Modern EDW Trends
• Cloud-Based EDWs: Leveraging cloud platforms for scalability and cost-
efficiency.
• Data Lakes: Combining structured and unstructured data for advanced
analytics.
• Data Governance: Implementing robust data management policies and
procedures.
• Self-Service BI: Empowering business users with data access and analysis
capabilities.
Data
Detailed Summarized
granularity
Benefits of an ODS
• Improved operational efficiency: Provides timely data for operational
decisions.
• Enhanced data quality: Cleanses and transforms data before loading into
the data warehouse.
• Faster reporting: Reduces query response time for operational reports.
• Reduced load on operational systems: Offloads reporting workload from
transactional systems.
Data Mart
A data mart is a subset of a data warehouse built to maintain a particular
department, region, or business unit. Every department of a business has a central
repository or data mart to store data. The data from the data mart is stored in the
ODS periodically. The ODS then sends the data to the EDW, where it is stored
and used.
A data mart is a subset of a data warehouse focused on a specific line of
business, department, or subject area. It contains a selected portion of data
from the larger data warehouse, tailored to the specific needs of a particular group
of users.
Breaks the vast problem into smaller Solves the essential low-level problem
subproblems. and integrates them into a higher one.
The data within the specific warehouse itself has a particular architecture with the
emphasis on various levels of summarization, as shown in figure:
The current detail record is central in importance as it:
o Reflects the most current happenings, which are commonly the most
stimulating.
o It is numerous as it is saved at the lowest method of the Granularity.
o It is always (almost) saved on disk storage, which is fast to access but
expensive and difficult to manage.
o Older detail data is stored in some form of mass storage, and it is infrequently
accessed and kept at a level detail consistent with current detailed data.
o Lightly summarized data is data extract from the low level of detail found at
the current, detailed level and usually is stored on disk storage. When building
the data warehouse have to remember what unit of time is summarization
done over and also the components or what attributes the summarized data
will contain.
o Highly summarized data is compact and directly available and can even be
found outside the warehouse.
Metadata is the final element of the data warehouses and is really of various
dimensions in which it is not the same as file drawn from the operational data,
but it is used as:-
o A directory to help the DSS investigator locate the items of the data
warehouse.
o A guide to the mapping of record as the data is changed from the
operational data to the data warehouse environment.
o A guide to the method used for summarization between the current,
accurate data and the lightly summarized information and the highly
summarized data, etc.
Data Modeling Life Cycle
In this section, we define a data modeling life cycle. It is a straight forward
process of transforming the business requirements to fulfill the goals for storing,
maintaining, and accessing the data within IT systems. The result is a logical and
physical data model for an enterprise data warehouse.
The objective of the data modeling life cycle is primarily the creation of a storage
area for business information. That area comes from the logical and physical data
modeling stages, as shown in Figure:
Enterprise Warehouse
An Enterprise warehouse collects all of the records about subjects spanning the
entire organization. It supports corporate-wide data integration, usually from one
or more operational systems or external data providers, and it's cross-functional
in scope. It generally contains detailed information as well as summarized
information and can range in estimate from a few gigabyte to hundreds of
gigabytes, terabytes, or beyond.
An enterprise data warehouse may be accomplished on traditional mainframes,
UNIX super servers, or parallel architecture platforms. It required extensive
business modeling and may take years to develop and build.
Data Mart
A data mart includes a subset of corporate-wide data that is of value to a specific
collection of users. The scope is confined to particular selected subjects. For
example, a marketing data mart may restrict its subjects to the customer, items,
and sales. The data contained in the data marts tend to be summarized.
Data Marts is divided into two parts:
Independent Data Mart: Independent data mart is sourced from data captured
from one or more operational systems or external data providers, or data generally
locally within a different department or geographic area.
Dependent Data Mart: Dependent data marts are sourced exactly from
enterprise data-warehouses.
Virtual Warehouses
Virtual Data Warehouses is a set of perception over the operational database. For
effective query processing, only some of the possible summary vision may be
materialized. A virtual warehouse is simple to build but required excess capacity
on operational database servers.
Data Warehouse Delivery Process
Now we discuss the delivery process of the data warehouse. Main steps used in
data warehouse delivery process which are as follows:
IT Strategy: DWH project must contain IT strategy for procuring and retaining
funding.
Business Case Analysis: After the IT strategy has been designed, the next step is
the business case. It is essential to understand the level of investment that can be
justified and to recognize the projected business benefits which should be derived
from using the data warehouse.
Education & Prototyping: Company will experiment with the ideas of data
analysis and educate themselves on the value of the data warehouse. This is
valuable and should be required if this is the company first exposure to the
benefits of the DS record. Prototyping method can progress the growth of
education. It is better than working models. Prototyping requires business
requirement, technical blueprint, and structures.
Business Requirement: It contains such as
• The logical model for data within the data warehouse.
• The source system that provides this data (mapping rules)
• The business rules to be applied to information.
• The query profiles for the immediate requirement
Technical blueprint: It arranges the architecture of the warehouse. Technical
blueprint of the delivery process makes an architecture plan which satisfies long-
term requirements. It lays server and data mart architecture and essential
components of database design.
Building the vision: It is the phase where the first production deliverable is
produced. This stage will probably create significant infrastructure elements for
extracting and loading information but limit them to the extraction and load of
information sources.
History Load: The next step is one where the remainder of the required history
is loaded into the data warehouse. This means that the new entities would not be
added to the data warehouse, but additional physical tables would probably be
created to save the increased record volumes.
AD-Hoc Query: In this step, we configure an ad-hoc query tool to operate
against the data warehouse.
These end-customer access tools are capable of automatically generating the
database query that answers any question posed by the user.
Automation: The automation phase is where many of the operational
management processes are fully automated within the DWH. These would
include:
• Extracting & loading the data from a variety of sources systems
• Transforming the information into a form suitable for analysis
• Backing up, restoring & archiving data
• Generating aggregations from predefined definitions within the Data
Warehouse.
• Monitoring query profiles & determining the appropriate aggregates to
maintain system performance.
Extending Scope: In this phase, the scope of DWH is extended to address a new
set of business requirements. This involves the loading of additional data sources
into the DWH i.e. the introduction of new data marts.
Requirement Evolution: This is the last step of the delivery process of a data
warehouse. As we all know that requirements are not static and evolve
continuously. As the business requirements will change it supports to be reflected
in the system.
Concept Hierarchy
Concept hierarchy is directed acyclic graph of ideas, where a unique name
identifies each of the theories.
An arc from the concept a to b denotes which is a more general concept than b.
We can tag the text with ideas.
Each text report is tagged by a set of concepts which corresponds to its content.
Tagging a report with a concept implicitly entails its tagging with all the ancestors
of the concept hierarchy. It is, therefore desired that a report should be tagged
with the lowest concept possible.
The method to automatically tag the report to the hierarchy is a top-down
approach. An evaluation function determines whether a record currently tagged
to a node can also be tagged to any of its child nodes.
If so, then then the tag moves down the hierarchy till it cannot be pushed any
further.
The outcome of this step is a hierarchy of report and, at each node, there is a set
of the report having a common concept related to the node.
The hierarchy of reports resulting from the tagging step is useful for many texts
mining process.
It is assumed that the hierarchy of concepts is called a priori. We can even have
such a hierarchy of documents without a concept hierarchy, by using any
hierarchical clustering algorithm, which results in such a hierarchy.
Concept hierarchy defines a sequence of mapping from a set of particular, low-
level concepts to more general, higher-level concepts.
In a data warehouse, it is usually used to express different levels of granularity of
an attribute from one of the dimension tables.
Concept hierarchies are crucial for the formulation of useful OLAP queries. The
hierarchies allow the user to summarize the data at various levels.
For example, using the location hierarchy, the user can retrieve data which
summarizes sales for each location, for all the areas in a given state, or even a
given country without the necessity of reorganizing the data.
Online Analytical Processing (OLAP)
OLAP stands for On-Line Analytical Processing. OLAP is a classification of
software technology which authorizes analysts, managers, and executives to gain
insight into information through fast, consistent, interactive access in a wide
variety of possible views of data that has been transformed from raw information
to reflect the real dimensionality of the enterprise as understood by the clients.
OLAP implement the multidimensional analysis of business information and
support the capability for complex estimations, trend analysis, and sophisticated
data modeling. It is rapidly enhancing the essential foundation for Intelligent
Solutions containing Business Performance Management, Planning, Budgeting,
Forecasting, Financial Documenting, Analysis, Simulation-Models, Knowledge
Discovery, and Data Warehouses Reporting. OLAP enables end-clients to
perform ad hoc analysis of record in multiple dimensions, providing the insight
and understanding they require for better decision making.
OLAP applications are used by a variety of the functions of an organization.
Finance and accounting:
o Budgeting
o Activity-based costing
o Financial performance analysis
o And financial modeling
Sales and Marketing
o Sales analysis and forecasting
o Market research analysis
o Promotion analysis
o Customer analysis
o Market and customer segmentation
Production
o Production planning
o Defect analysis
OLAP cubes have two main purposes. The first is to provide business users with
a data model more intuitive to them than a tabular model. This model is called a
Dimensional Model.
The second purpose is to enable fast query response that is usually difficult to
achieve using tabular models.
How OLAP Works?
Fundamentally, OLAP has a very simple concept. It pre-calculates most of the
queries that are typically very hard to execute over tabular databases, namely
aggregation, joining, and grouping. These queries are calculated during a process
that is usually called 'building' or 'processing' of the OLAP cube. This process
happens overnight, and by the time end users get to work - data will have been
updated.
OLAP Guidelines (Dr.E.F.Codd Rule)
Dr E.F. Codd, the "father" of the relational model, has formulated a list of 12
guidelines and requirements as the basis for selecting OLAP systems:
1) Multidimensional Conceptual View: This is the central features of an OLAP
system. By needing a multidimensional view, it is possible to carry out methods
like slice and dice.
2) Transparency: Make the technology, underlying information repository,
computing operations, and the dissimilar nature of source data totally transparent
to users. Such transparency helps to improve the efficiency and productivity of
the users.
3) Accessibility: It provides access only to the data that is actually required to
perform the particular analysis, present a single, coherent, and consistent view to
the clients. The OLAP system must map its own logical schema to the
heterogeneous physical data stores and perform any necessary transformations.
The OLAP operations should be sitting between data sources (e.g., data
warehouses) and an OLAP front-end.
4) Consistent Reporting Performance: To make sure that the users do not feel
any significant degradation in documenting performance as the number of
dimensions or the size of the database increases. That is, the performance of
OLAP should not suffer as the number of dimensions is increased. Users must
observe consistent run time, response time, or machine utilization every time a
given query is run.
5) Client/Server Architecture: Make the server component of OLAP tools
sufficiently intelligent that the various clients to be attached with a minimum of
effort and integration programming. The server should be capable of mapping
and consolidating data between dissimilar databases.
6) Generic Dimensionality: An OLAP method should treat each dimension as
equivalent in both is structure and operational capabilities. Additional operational
capabilities may be allowed to selected dimensions, but such additional tasks
should be grantable to any dimension.
7) Dynamic Sparse Matrix Handling: To adapt the physical schema to the
specific analytical model being created and loaded that optimizes sparse matrix
handling. When encountering the sparse matrix, the system must be easy to
dynamically assume the distribution of the information and adjust the storage and
access to obtain and maintain a consistent level of performance.
8) Multiuser Support: OLAP tools must provide concurrent data access, data
integrity, and access security.
9) Unrestricted cross-dimensional Operations: It provides the ability for the
methods to identify dimensional order and necessarily functions roll-up and drill-
down methods within a dimension or across the dimension.
10)Intuitive Data Manipulation: Data Manipulation fundamental the
consolidation direction like as reorientation (pivoting), drill-down and roll-up,
and another manipulation to be accomplished naturally and precisely via point-
and-click and drag and drop methods on the cells of the scientific model. It avoids
the use of a menu or multiple trips to a user interface.
11)Flexible Reporting: It implements efficiency to the business clients to
organize columns, rows, and cells in a manner that facilitates simple
manipulation, analysis, and synthesis of data.
12)Unlimited Dimensions and Aggregation Levels: The number of data
dimensions should be unlimited. Each of these common dimensions must allow
a practically unlimited number of customer-defined aggregation levels within any
given consolidation path.
Characteristics of OLAP
In the FASMI characteristics of OLAP methods, the term derived from the first
letters of the characteristics are:
Fast
It defines which the system targeted to deliver the most feedback to the client
within about five seconds, with the elementary analysis taking no more than one
second and very few taking more than 20 seconds.
Analysis
It defines which the method can cope with any business logic and statistical
analysis that is relevant for the function and the user, keep it easy enough for the
target client. Although some preprogramming may be needed we do not think it
acceptable if all application definitions have to be allow the user to define new
Adhoc calculations as part of the analysis and to document on the data in any
desired method, without having to program so we excludes products (like Oracle
Discoverer) that do not allow the user to define new Adhoc calculation as part of
the analysis and to document on the data in any desired product that do not allow
adequate end user-oriented calculation flexibility.
Share
It defines which the system tools all the security requirements for understanding
and, if multiple write connection is needed, concurrent update location at an
appropriated level, not all functions need customer to write data back, but for the
increasing number which does, the system should be able to manage multiple
updates in a timely, secure manner.
Multidimensional
This is the basic requirement. OLAP system must provide a multidimensional
conceptual view of the data, including full support for hierarchies, as this is
certainly the most logical method to analyze business and organizations.
Information
The system should be able to hold all the data needed by the applications. Data
sparsity should be handled in an efficient manner.
The main characteristics of OLAP are as follows:
1. Multidimensional conceptual view: OLAP systems let business users
have a dimensional and logical view of the data in the data warehouse. It
helps in carrying slice and dice operations.
2. Multi-User Support: Since the OLAP techniques are shared, the OLAP
operation should provide normal database operations, containing retrieval,
update, adequacy control, integrity, and security.
3. Accessibility: OLAP acts as a mediator between data warehouses and
front-end. The OLAP operations should be sitting between data sources
(e.g., data warehouses) and an OLAP front-end.
4. Storing OLAP results: OLAP results are kept separate from data sources.
5. Uniform documenting performance: Increasing the number of
dimensions or database size should not significantly degrade the reporting
performance of the OLAP system.
6. OLAP provides for distinguishing between zero values and missing values
so that aggregates are computed correctly.
7. OLAP system should ignore all missing values and compute correct
aggregate values.
8. OLAP facilitate interactive query and complex analysis for the users.
9. OLAP allows users to drill down for greater details or roll up for
aggregations of metrics along a single business dimension or across
multiple dimension.
10.OLAP provides the ability to perform intricate calculations and
comparisons.
11.OLAP presents results in a number of meaningful ways, including charts
and graphs.
Benefits of OLAP
OLAP holds several benefits for businesses: -
1. OLAP helps managers in decision-making through the multidimensional
record views that it is efficient in providing, thus increasing their
productivity.
2. OLAP functions are self-sufficient owing to the inherent flexibility support
to the organized databases.
3. It facilitates simulation of business models and problems, through
extensive management of analysis-capabilities.
4. In conjunction with data warehouse, OLAP can be used to support a
reduction in the application backlog, faster data retrieval, and reduction in
query drag.
OLAP operations
In the multidimensional model, the records are organized into various
dimensions, and each dimension includes multiple levels of abstraction described
by concept hierarchies. This organization support users with the flexibility to
view data from various perspectives. A number of OLAP data cube operation
exist to demonstrate these different views, allowing interactive queries and search
of the record at hand. Hence, OLAP supports a user-friendly environment for
interactive data analysis.
Consider the OLAP operations which are to be performed on multidimensional
data. The figure shows data cubes for sales of a shop. The cube contains the
dimensions, location, and time and item, where the location is aggregated with
regard to city values, time is aggregated with respect to quarters, and an item is
aggregated with respect to item types.
Roll-Up
The roll-up operation (also known as drill-up or aggregation
operation) performs aggregation on a data cube, by climbing down concept
hierarchies, i.e., dimension reduction. Roll-up is like zooming-out on the data
cubes. Figure shows the result of roll-up operations performed on the dimension
location. The hierarchy for the location is defined as the Order Street, city,
province, or state, country. The roll-up operation aggregates the data by ascending
the location hierarchy from the level of the city to the level of the country.
When a roll-up is performed by dimensions reduction, one or more dimensions
are removed from the cube. For example, consider a sales data cube having two
dimensions, location and time. Roll-up may be performed by removing, the time
dimensions, appearing in an aggregation of the total sales by location, relatively
than by location and by time.
Example
Consider the following cubes illustrating temperature of certain days
recorded weekly:
Temperature 64 65 68 69 70 71 72 75 80 81 83 85
Week1 1 0 1 0 1 0 0 0 0 0 1 0
Week2 0 0 0 1 0 0 1 2 0 1 0 0
Consider that we want to set up levels (hot (80-85), mild (70-75), cool (64-69))
in temperature from the above cubes.
To do this, we have to group column and add up the value according to the concept
hierarchies. This operation is known as a roll-up.
By doing this, we contain the following cube:
Temperature cool mild
Week1 2 1
Week2 2 1
Drill-Down
The drill-down operation (also called roll-down) is the reverse operation of roll-
up. Drill-down is like zooming-in on the data cube. It navigates from less
detailed record to more detailed data. Drill-down can be performed by
either stepping down a concept hierarchy for a dimension or adding additional
dimensions.
Figure shows a drill-down operation performed on the dimension time by
stepping down a concept hierarchy which is defined as day, month, quarter, and
year. Drill-down appears by descending the time hierarchy from the level of the
quarter to a more detailed level of the month.
Because a drill-down adds more details to the given data, it can also be performed
by adding a new dimension to a cube. For example, a drill-down on the central
cubes of the figure can occur by introducing an additional dimension, such as a
customer group.
Example
Drill-down adds more details to the given data
Day 1 0 0 0
Day 2 0 0 0
Day 3 0 0 1
Day 4 0 1 0
Day 5 1 0 0
Day 6 0 0 0
Day 7 1 0 0
Day 8 0 0 0
Day 9 1 0 0
Day 10 0 1 0
Day 11 0 1 0
Day 12 0 1 0
Day 13 0 0 1
Day 14 0 0 0
Slice
A slice is a subset of the cubes corresponding to a single value for one or more
members of the dimension. For example, a slice operation is executed when the
customer wants a selection on one dimension of a three-dimensional cube
resulting in a two-dimensional site. So, the Slice operations perform a selection
on one dimension of the given cube, thus resulting in a subcube.
For example, if we make the selection, temperature=cool we will obtain the
following cube:
Temperature Cool
Day 1 0
Day 2 0
Day 3 0
Day 4 0
Day 5 1
Day 6 1
Day 7 1
Day 8 1
Day 9 1
Day 11 0
Day 12 0
Day 13 0
Day 14 0
The following diagram illustrates how Slice works.
Here Slice is functioning for the dimensions "time" using the criterion time =
"Q1".
It will form a new sub-cubes by selecting one or more dimensions.
Dice
The dice operation describes a subcube by operating a selection on two or more
dimension.
For example, Implement the selection (time = day 3 OR time = day 4) AND
(temperature = cool OR temperature = hot) to the original cubes we get the
following subcube (still two-dimensional)
Temperature Cool hot
Day 3 0 1
Day 4 0 0
Types of OLAP
There are three main types of OLAP servers are as following:
ROLAP stands for Relational MOLAP stands for HOLAP stands for
Online Analytical Multidimensional Online Hybrid Online Analytical
Processing. Analytical Processing. Processing.
The ROLAP storage mode The MOLAP storage mode The HOLAP storage
causes the aggregation of the principle the aggregations of mode connects attributes
division to be stored in the division and a copy of its of both MOLAP and
indexed views in the source information to be saved ROLAP. Like MOLAP,
relational database that was in a multidimensional HOLAP causes the
specified in the partition's operation in analysis services aggregation of the
data source. when the separation is division to be stored in a
processed. multidimensional
operation in an SQL
Server analysis services
instance.
ROLAP does not because a This MOLAP operation is HOLAP does not causes a
copy of the source highly optimize to maximize copy of the source
information to be stored in query performance. The information to be stored.
the Analysis services data storage area can be on the For queries that access
folders. Instead, when the computer where the partition is the only summary record
outcome cannot be derived described or on another in the aggregations of a
from the query cache, the computer running Analysis division, HOLAP is the
indexed views in the record services. Because a copy of the equivalent of MOLAP.
source are accessed to answer source information resides in
queries. the multidimensional
operation, queries can be
resolved without accessing the
partition's source record.
Query response is frequently Query response times can be Queries that access
slower with ROLAP storage reduced substantially by using source record for
than with the MOLAP or aggregations. The record in the example, if we want to
HOLAP storage mode. partition's MOLAP operation drill down to an atomic
Processing time is also is only as current as of the most cube cell for which there
frequently slower with recent processing of the is no aggregation
ROLAP. separation. information must retrieve
data from the relational
database and will not be
as fast as they would be if
the source information
were stored in the
MOLAP architecture.
Difference between ROLAP and MOLAP
ROLAP MOLAP