0 ratings0% found this document useful (0 votes) 30 views34 pagesData Warehousing Notes
Data warehousing notes unit 1 full notes for 3 rd year students
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
UNIT I
Introduction to Data
Warehouse
Syllabus
Data warehouse Introduction - Data warehouse components- operational database Vs data
warehouse ~ Data warehouse Architecture ~ Three-tier Data Warehouse Architecture - Autonomous
Data Warehouse- Autonomous Data Warehouse Vs Snowflake - Modern Data Warehouse.
Contents
14
12
18
14
18
16
17
18
19
Introduction
History of Data Warehouse
Data Warehouse Tools
Need for Data Warehouse
Benefits of Data Warehouse
Data Warehouse Working ?
Data Warehouse Components
Operational Database Vs Data Warehouse
Data Warehouse Architecture
1.10 Autonomous Data Warehouse
1.11. Snowflake : A Different Date Warehouse Architecture
1.12 Two Marks Questions with Answers
aaData Warshousing 1-2 Introduction to Data Warehouse
Introduction
© The business firms often generate billions of bytes of data every day on all parts of
company, which contains millions of unique details about their clients, personnel and
operations. However, access to this data is severely restricted and sealed up. Executives
and decision - makers have access to a very limited portion of the data that is actually
gathered, processed, and stored in the company.
© A new technology has recently developed from new ideas and tools, enabling all the
essential players in an organization to have access to the knowledge they need to thrive
in a world that is becoming more and more competitive. "Data warehousing” is the
name given to this new technology. I will be talking about the fundamental ideas and
vocabulary of data warehousing in this unit,
© Our first desktop "What if” processing test was with the Lotus. The goal of a data
warehouse is to leverage the information of company has acquired to help it respond
more effectively, intelligently, quickly and efficiently.
‘© Modern organizations depend on the efficient gathering, storing and integrating of data
from many sources for analysis and insights. These data analytics roles are now vital to
cost reduction, revenue growth and profit maximization. Therefore, it is not surprising
that both the amount of data that has been collected and analyzed as well as the variety
and number of data sources have expanded,
* Data-driven enterprises need robust solutions to handle and analyze massive volumes of
data throughout their whole organization. These solutions are reliable, scalable and
secute enough for regulated industries while supporting a broad variety of data formats
and use cases. Any typical database would struggle to meet the needs. The data
warehouse may be of assistance here.
‘© A data warehouse is created by combining data from several sources. Ad hoe or planned
inquities, analytical reporting and decision - making are made easier. This course uses @
step-by-step technique to teach every fundamental data warchousing theory.
‘* A Relational Database Management System (RDBMS) concept called a data warehouse
was created to address the needs of transaction processing systems. Any centralized data
store that may be searched for commercial advantages can be broadly characterized as
such, It is a database that holds data intended to satiate inquiry - based decision -
making, It is a collection of technologies designed to aid in decision - making and
empower knowledge workers (executive, manager and analyst). In order to help
TECHNICAL PUBLICATIONS? - an upthrust or nowedyeData Warehousing 21:3 Introduction to Data Warehouse
‘corporate leaders consistently organize, comprehend and apply their information to
‘make strategic choices, data warehousing supports structures and tools.
«A Data Warehousing (DW) process is used to gather and manage data from many
sources in order to provide insightful business information. Business data from many
sources is often connected and analyzed using a data warehouse. The central component
of the BI system, which is designed for data analysis and reporting, is the data
warehouse.
The combination of several technologies and elements facilitates the strategic use of
data, Large amounts of data are electronically stored by a company and are intended for
analysis and inquiry rather than transaction processing, It is a process of converting data
into information and promptly making it accessible to people so that it might have an
impact.
© An OLAP engine, customer analysis tools, an Extraction, Transportation and Loading.
(ETL) solution, as well as additional applications that manage data collection and
delivery to business users are all included in the data warehouse environment.
What is Data Warehouse ?
© A data system having the following characteristics might be thought of as a data
‘warehouse :
© It isa database that uses information from many apps to-be used for investigative
activities,
© It allows for reasonably lengthy exchanges between a limited number of
‘customers.
© Itcontains both recent and old data to give information a historical context.
© Its use involves a lot of reading
© There are a couple big tables in there.
© "Data Warehouse is a subject - oriented, integrated and time - variant store. of
information to support management's decisions."
+ A Data Warehousing (DW) process is used to gather and manage data from many
sources in order to provide pertinent business information. Business data from many
sources is routinely linked together and analyzed using a data warehouse.
© Data from many sources, including as point-of-sale operations, marketing automation,
relationship management and more, is processed and reported on using an enterprise
system called a data warehouse. Ad hoc analysis and customized reporting are both
TECHNICAL PUBLICATIONS®- an up-tust for rowedData Warehousing 1-4 Introduction to Data Warehouse,
appropriate uses for data warehouses. A data warehouse is a crucial part of business
intelligence since it was created to provide a long-term perspective on data throughout
time and can store both current and historical data in one location.
History of Data Warehouse
© Let us first review the historical management schemes of the analysis data and the
factors that have led to the evolution of the data warehousing application class,
Traditional approaches to historical data
‘© The operational systems and the data they process have received the majority of
attention throughout the development of systems, Since it was impractical to maintain
data in operating systems permanently, a framework for archiving the data such systems
had processed was only conceived of after the fact. Operational systems need
performance, but analysis systems need flexibility and a wide range of capabilities.
These are the essential needs that distinguish operational systems from analysis systems.
Data from legacy systems
Over the last three decades, several platforms have evolved with the evolution of
computer technology. Business system development was carried out on IBM mainframe
computers in the 1970s utilizing programs like Cobol, CICS, IMS, DB2, etc.
¢ The 1980s saw the development of computer platforms like the AS / 400 and VAX /
VMS. The client / server architecture, which is still in use today, was introduced by the
server platform UNIX in the late 1980s and early 1990s,
+ The mainframe environment from the 1970s is still used to operate a significant number
of commercial applications, despite all the changes in platforms, architectures, tools and
technology, The main reason is that through time, these systems have accumulated
business knowledge and regulations that are very challenging to transfer to a new
platform or application. These are often referred to as legacy systems. Such technologies
eventually make the data they store distant and challenging to access.
Extracted inform:
ion on the desktop
© The personal computer has significantly increased in use for business analysis during the
last ten years. Many of the tools needed to utilize spreadsheets for analysis and graphical
depiction are now available to business analysts. Advanced users will regularly work
with and store data retrieved from the historical sources using desktop database tools
© The above has the drawback of leaving the data fragmented and focused on very
particular requirements. Only the information that each individual user needs has been
acquired. The needs of many consumers and uses carinot be met by the extracts.
TECHNICAL PUBLICATIONS? - an up-trust for knowledgeData Warehousing 1:6 Introduction to Data Warehouse
‘Addressing the needs of a single user requires a lot of effort and money. The drawbacks
experienced prompted the creation of the new application known as data warehousing.
Factors, which lead to data warehousing
* The data warehousing field has quickly developed due to a variety of variables. The
development of hardware and software technologies has been the most crucial element.
* Costs of hardware and software Prices for hardware and software have significantly
decreased, Cheaper memory chips with a higher capacity are readily accessible.
‘© Strong preprocessors : Preprocessors nowadays are far more powerful than mainframes
of the past, such as Pentium III and Alpha processors.
« Budget-friendly disks : As hard disk costs decrease, they can now hold hundreds of
gigabytes. In the 1970s and the early 1980s, it would have taken an entire roomful of
disk drives to store the amount of data that can now be stored on a single disk drive that
is just one inch high.
‘© Strong desktop analysis tools desktops can do tasks that the old - school mainframe
computers couldn't, including multi - tier computing, client / server architecture, or
user - friendly GUI interfaces.
«Server software : When compared to earlier versions, server software is now reasonably
priced, robust and simple to maintain.
‘© An example of this is Windows NT, which has made the configuration of powerful
computers incredibly simple and cheaper.
© The most significant factor in the development of data warehouses has been the
exponential growth of hardware and sofiware capabilities, together with the accessibility
of low - cost, user - friendly reporting and analytical tools.
Emergence of standard business applications,
'* Users may now get popular business application packages from new suppliers. The
German software companies SAP AG, Baan, PeopleSoft and Oracle have all released
software suites with varying capabilities but similar funetions. Standard applications are
offered by these application suites, which may take the place of the old custom -
developed apps now in use.
‘© This is what has caused such programs to become more popular.
‘+ Additionally, compared to mainframes, these applications’ data gathering is far more
straightforward.
TECHNICAL PUBLICATIONS® - an vptrt for knowledgeData Warehousing 1-6 Introduction to Data Warehouse
End - user more technology oriented
¢ The development of a technology - focused business analyst is one of the most
significant outcomes of the tremendous investment in technology and march toward the
powerful personal computer. Even while technology - focused end users are not always
advantageous to all projects, this trend has undoubtedly generated a crop of business
analysts who are at the foreffont of technology who are now crucial to today’s
businesses. These technologically savvy end customers have typically contributed
significantly to the design and implementation of data warehouses. They are now the
primary users who are the first to highlight the early advantages of data warchouses.
These end users are essential for the growth of the data warehouse model because they
teach other users as they gain expertise with the system.
‘© Users may conduct more precise analyses and improve the functioning of their
organizations by using the data warchouse. As computers got more advanced and
required to process more and larger amounts of data, the necessity to store data
developed. But data warehousing is hardly a brand-new concept.
Following are some key events in evolution of data warehouse -
‘+ 1960 - The phrases measurements and facts are created in 1960 as a result of a
cooperative research initiative between Dartmouth and General Mills
‘+ 1970 - Retail sales dimensional data marts are introduced by Nielsen and IRI.
«1983 - A system for managing databases created with decision assistance in mind is,
offered by Tera Data Corporation.
© When IBM employees Paul Murphy and Bary Devlin created the Business Data
‘Warehouse in the late 1980s, data warehousing was bom.
‘* However, Inman Bill provided the actual notion. He was revered as the inventor of the
data warehouse. He has written on a range of subjects related to the construction,
operation and upkeep of the warehouse and the corporate information factory.
Characteristics of Data Warehousing
© Typically, a data warehouse has four characteristics, according to W. H. Inmon, author
of Building the data warehouse and the expert who is usually regarded as the concept's
creator
‘© A data warehouse, according to W. H. Inmon, is "a subject - oriented, integrated,
nonvolatile, time - variant collection of data to support management's decisions." Access
to data for in-depth analysis, knowledge discovery and decision - making is made
possible by data warehouses,
TEGHNIGAL PUBLICATIONS? - an up-hrst for rowedData Warehousing 4-7 Introduction to Data Warehouse
Subject oriented
‘© The organization of data is subject - based rather than application - based, for example,
‘an insurance business employing a data warehouse would arrange its data by customer,
‘premium and claim rather than by various produets (cat, life, ete.). The subject - specific
data only includes the details required for decision support processing.
Integrated
‘© Encoding of data is often uneven when it is stored in different financial apps in the
operational environment. For instance, although gender may be classified as "m" and "f"
in ‘one application, 0 and | in another. When data are transferred from the operational
environment to the data warehouse, they are assumed to follow a uniform coding
scheme, for example, gender data is changed to"m" and "f."
Change in time
‘© Data that are five to ten years old or older may be stored in the data warehouse and
utilized for comparisons, trends and forecasts. These statistics are outdated.
Non-volatile
© Once they are entered into the data warehouse, the data are merely accessed - they are
not updated or altered in any way.
«These specific qualities define data warehouses.
© Acconceptual multidimensional perspective.
Universal dimension.
Limitless aggregation and dimension levels.
0 0 0
‘Unrestricted operations across dimensions.
Handling of dynamic sparse matrices.
‘The client - server architecture.
Multi - user assistance.
Availability.
Openness
0.000
Intelligent data manipulation,
Accurate performance in reporting.
0000
Flexible reporting
‘© Data warehouses are often an order of magnitude (and pethaps two orders of magnitude)
bigger than the source databases since they contain massive amounts of data
TECHNICAL PUBLICATIONS® - an up-to knonfedy=Data Warehousing 1-8 Introduction to Data Warehouse:
Enterprise - wide data warehouses, virtual data warehouses and data marts have all been
used to address the problem of the sheer amount of data, which is most likely to be
‘measured in terabytes
‘© Enterprise - wide data warehouses are huge initiatives requiring substantial time and
resource investments.
+ Virtual data warehouses provide users access to materialized views of operational
databases.
* Data marts are often more narrowly focused and aimed to a certain group of the
company, like a dependent.
* In order to summarize the aforementioned, the following key elements about distinct
data warehouse features should be kept in mind
© Subject - focused
‘* Organized around key topics, such customers, products and sales.
Concentrating on data modeling and analysis for decision - making rather than routine
tasks or transaction processing,
* Provide a clear and brief overview of a certain topic by excluding information that is not,
relevant to the decision - making process.
Integrated
‘Built by combining several, disparate data sources, such as relational databases, flat files
and online transaction records.
‘Offering methods for data integration and cleansing.
Time variation
* Compared to operational systems, the data warehouse has a much longer time horizon.
© Every key structure in the data warehouse has an explicit or implicit reference to time.
‘© Non - volatile data that has been changed from the operational environment that is
physically stored separately and doesn't need transaction processing, recovery, or
concurrency control methods.
Only requires the initial loading of the data and the access of the data (no data updates
are necessary),
Data Warehouse Tools
© The tools that enable peroper sourcing of data formats and céntents from external data
storage into the data warehouse must carry out a number of crucial functions, including :
‘© Integration and consolidation of data.
E TECHNICAL PUBLICATIONS - an up-rus for knowledgeData Warehousing 1-9 Introduction to Data Warehouse
© Data translation between different forms
© Based on the function of business rules that néed transformation, . data
transformation and computation.
© Synchronization and maintenance of metadata, which includes archiving or
updating infortnation about events, loading formats, transformation activities and
source files.
‘© There are several selection criteria which should be considered, while implementing a
data warehouse
© It is required to be able to recognize the data that the tool can read from the data
source environment.
© Itis crucial to provide support for flat files, indexed files and older DBMSS.
In many setups, the ability to combine records from several data stores is
necessary.
© Conversation and the specification interface, which indicates the information to be
extracted, are crucial
© Itis desirable to be able to read data from repository products or data dictionaries.
©The tool's developed code need to be entirely maintained,
© Users may extract just the necessary data using selective data extraction for both
data items and records.
© For the conversion of data into information, a field - level data inspection is
required.
© When transferring data across disparate systems, the capacity to execute data type
and character - set translation is necessary.
© Aggregation, summarization and derivation fields and records must be able to be
created.
© Vendor reliability and product support are important factors that need to be
carefully considered.
Goals of data warehousing
«To support reporting and analysis,
«Keep the organization's historical records up to date
+ Serve as the basis for decision - making.
"TECHNICAL PUBLICATIONS® an vptnut for knowledgeData Warehousing 1-10 Introduction to Data Warehouse
Need for Data Warehouse
‘© The following justifies the need for a data warehouse
1. Business ser : To access historical data summaries, business users need a data
warehouse. These non - technical persons could just understand the information.
2. Archive historical data: A data warehouse is required to archive historical
time-variable data. This input is made to do a number of different tasks.
3. Make strategic choices : Several approaches may be based on the data in the data
warehouse. Data warehouse aids in strategic decision - making as a result.
4, For uniformity and consistency in data : The user may effectively work to give
‘uniformity and consistency in data by merging data from multiple sources.
5. Quick reaction times : Data warehouses need to be flexible and quick to respond in
order to handle erratic loads and query patterns.
Benefits of Data Warehouse
© Recognize market trends to enhance our forecasts.
Data warehouses are designed to manage enormous amounts of data.
End users can navigate, understand and query data warehouse architectures more easily.
Complex queries that would be challenging to build and handle in many normalised
databases could be made easier by data warehouses.
Data warehousing is a useful method for managing the demand for a lot of information
from a lot of customers.
+ Data warehousing enables the analysis of a substantial amount of historical data. What Is
a Data warehouse used for ?
© Here are the industries where data warehouses are most often used
© Airline : It is utilised for operational purposes in the airline system, such as personnel
assignment, studies of route profitability, frequent flyer programme promotions, ete
‘+ Banking : It is often used in the banking industry to efficiently manage the resources on
the desk. A few banks are also utilized for operations, product performance monitoring,
and market research.
‘© Healthcare : Data warehouses were also utilized by the healthcare industry to plan and
forecast results, provide patient treatment reports and communicate data with affliated
insurance firms, medical assistance organizations, ete.
TECHNICAL PUBLICATIONS? - an up-tvus fr knowledgeData Werehousing 4-1 Introduction to Data Warehouse
'® Government sector ; Data warehouses are utilised for intelligence collection in the
public sector. It aids in the upkeep and analysis of each person's tax data and health
insurance records by government authorities.
Sectors of investment and insurance
‘© In this industry, warehouses are largely used to watch market trends, analyse consumer
trends and analyse data patterns.
Keep the chain :
‘© Data warehouses are often utilised in retail chains for distribution and marketing.
‘Additionally, it aids in keeping track of products, consumer purchasing trends,
promotions and pricing strategy.
Telecommunication :
‘© In this industry, distribution choices, sales decisions and product marketing decisions
are all made using a data warehouse.
The hospitality sector
© Based on customer feedback and travel habits, this industry uses warehousing services
to plan and predict the locations for its advertising and marketing efforts.
Data Warehouse Working ?
‘Steps to implement data warehouse
© Using a three - pronged approach, is the best method to reduce the business risk
connected with the development of a data warehouse.
© Enterprise strategy In this section, we define technical terms such as existing
architecture and technologies. Additionally, we define facts, dimensions and
characteristics. Additionally passed are data transformation and mapping.
© Phased implementation : Data warehouse implementation should be done in accordance
with topic areas. Booking and billing are examples of related business entities that
should be developed first before being connected,
‘© Iterative prototyping : Developing and testing the data warehouse iteratively is
preferable than a big bang approach to implementation. .
# ‘One or more data sources provide their data to a data warehouse, which acts as a central
store for the data. The transactional system and other relational databases send data to
data warehouse.
TECHNICAL PUBLICATIONS® - an upstivust for knowledgeData Warehousing 1-12 Introduction to Data Warehouse
Data may be :
1. Structured
2. Semi - structured
3. Unstructured data
* Users may access the converted data in the data warehouse using business intelligence
tools, SQL clients and spreadsheets once the data has been changed, transformed and
ingested. Data from several sources is merged in a data warehouse to produce a large
database. An organization may analyses its clients more thoroughly by combining all of
this data.in one location. This makes sure that all the information is taken into account.
Data mining is made feasible by data warehousing. Data mining searches for patterns in
the data that might result in increased revenue and profitability.
Advantages of data warehouse
* Business users may easily access crucial data ftom a variety of sources using data
warehouses.
+ Consistent data on’ multiple cross - functional operations is provided via data warehouse.
Ad hoe reporting and querying are also supported.
* To lessen the strain on the production system, data warehouses assist in integrating
several data sources.
‘Using a data warehouse may speed up analysis and reporting overall.
‘The user may utilize it more easily for reporting and analysis thanks to restructuring and
integration,
© Users may obtain crucial data from several sources in a single location thanks to data
warehouses. As a result, it saves users’ time while obtaining data from various sources.
‘© A substantial quantity of historical data is kept in data warehouses. Users may use this to
analyze various historical periods and patterns to forecast the future.
Disadvantages of data warehouse
© Appoor choice for unstructured data,
© The development and implementation of a data warehouse are undoubtedly time-
consuming tasks.
* Data Warehouse may easily become out of date.
© Changes to data types and ranges, data source structure, indexes and searches are
challenging.
TECHNICAL PUBLICATIONS® - an up-rust for knowledgeData Warehousing 4-13. Introduction to Data Warehouse
‘© Although the data warehouse may seem simple, most consumers would find it to be
excessively complicated.
© The scope of a data warehousing project will constantly expand, even the finest project
management efforts.
‘© Users of warehouses may sometimes create unique business rules.
‘© Organizations must invest a significant amount of their resources in training and
implementation.
Data Warehouse Components
«Users may access the converted data in the data warehouse using business intelligence
tools, SQL clients and spreadsheets once the data has been changed, transformed and
ingested, Data from several sources is merged in a data warehouse to produce a large
database.
© Fig. 1.7.1 shows the components of a data warehouse.
oS
=a
Metadata
Fig. 1.7.4
‘© Central database : The data warehouse is a database. They were typically local or cloud -
based common relational databases. However, in - memory databases are quickly
gaining acceptance due to big data, the need for true, real - time performance and a
sharp drop in RAM prices.
‘© Data integration : Several data integration methods are used to extract data from source
systems, change it and align it for easy analytical consumption. ETL (Extract,
Transform, Load) and ELT procedures are among them, as well as real-time data
replication, bulk - load processing, data transformation and services for data quality and
enrichment.
«Data about our data is known as metadata, All of the characteristics of the data sets in
data warehouse - their origin, purpose, values and other details - are recorded, Technical
metadata describes where and how data should be stored, as well as how to get it.
Business metadata provides the céntext of data.
TECHNICAL PUBLICATIONS® - an uphrust for nowedye4 Introduction to Data Warehouse
Data Warehousing
© Tools for accessing data warehouse : With the help of these tools, users may interact
with the data inside, A few examples of access tools are data mining, OLAP, query and
reporting tools and application development tools.
The best approach to assemble the pieces is via architecture. We combine hardware and
software elements to create a data warchouse, We reserve the right to shift equipment
and services to another site, however these structures are built to meet the needs of our
organizations. They are all dependent on the situation we are in.
/ ‘mining
Source data
Management and conteo
5
:
3
a sS
: Ss E
= ‘Multi- \-——
| dine _
0 AW@@ga
S =| \e
== Report / Query
Data staging
Fig. 1.7.2 Data warehouse component
* The graphic shows the essential elements of a typical warehouse. The source data
component is located on the left. The data staging component is the following building
element, In the middle is the data storage component, which controls the data in the data
warehouses. Along with storing and managing data, this component also manages and
tracks data via the metadata repository. The information delivery component on the right
displays the many ways that users can obtain data from data warehouses.
Source Data Component
* Following are the categories used to classify the source data entering the data
‘warehouses :
1. Production data : These kinds of data come from the many operating systems used by
the company. Based on the data requirements in the data warehouse, we choose data
segments from the various operational modes.
2, Internal data : Databases, reports, customer profiles and occasionally even
TECHNICAL PUBLICATIONS® - an uptrut fr inowladgsData Warehousing 4-15 Introduction to Data Warehouse
spreadsheets are kept "private" by each client organization, This is internal data,
some of which may be included into a data warehouse
3, Archived information : Operational systems’ main objective is to manage the current
business. Every operating system has old data that we regularly erase and store in
achieved files.
4, Extemal information : For the information they use, the majority of CEOs rely
extensively on data from outside sources. They make use of statistics produced by a
different department that are particular to the sector.
Data Staging Component
‘© Data must be extracted from several operating systems and extemal sources before
being processed for storage in the data warehouse. For the purpose of querying and
analysis, it is required to modify, cliange and prepare the data that has been obtained
from various sources in an acceptable manner.
© The three main tasks performed in the staging area will now be covered.
1) Data extraction : This technique must handle a variety of data sources. Each data
source requires a different set of methodologies, which we must use.
2) Data transformation : It is well known that data for a data warehouse comes from a
number of sources. For a data warehouse, data transformation poses even more
challenges than data extraction. We do a variety of different actions to change data.
‘© First, the information that was obtained from each source is cleansed, When merging
data from several sources, eliminating duplicate entries or setting default values for
missing data elements are all examples of cleaning.
© Standardizing data component components is a crucial part of data transformation. Data
transformation may take many different forms when combining information from
several sources, We combine relevant data from many sources or data ftom a single
source record.
# Data transformation, on the opposite hand, also entails separating outsourced records
into fresh combinations and deleting pointless source data, Data are massively sorted
‘and combined at the data staging area. When the data transformation function is
finished, we get a set of integrated, purified, standardized and summarized data.
‘* Data loading : The two sorts of actions that make up data loading functions are different.
When the data warehouse's structure and construction are complete and it first goes
online, the data is initially loaded into the storage area of the data warehouse, At first,
large volumes of data must be transported, which takes time.
TECHNICAL PUBLICATIONS? -an up-tiust for nowedyeData Warehousing 1-16 Introduction to Data Warehouse
Data storage hardware
* An information split repository is used in data warehousing. Most often, the operating
systems data vaults only hold the most current information. These data repositories also
provide entirely normalized data that is arranged for quick processing. Information
delivery component.
© The process of subscribing for data warehouse files and having data transported to one
or more destinations in accordance with some scheduling algorithm set by the client is
made possible by the information delivery element.
Component of metadata
+ A data warehouse's metadata is equivalent to a database management system's data
dictionary or data catalogue. We save details about logical data structures, records and
addresses, details about indexes and other details in the data dictionary.
Data marts
‘+ Ttcontains a portion of corporate - wide data that is useful to a particular user group. The
focus is limited to a few carefully chosen topics. Although advancements in the data
warehouse business have made regular and incremental data dumps more feasible, data
in a data warehouse should be at least somewhat current but not necessarily up to the
‘minute, Data-marts often have organization and are smaller than data warehouses. A
data warehouse should be constructed together with multiple smaller, related data marts
for certain types of queries and reporting, according to current data warehousing trends.
Management and Control Component
* The services and operations within the data warehouse are coordinated by the
management and control elements. These parts regulate how the data is transformed and
how it is sent to the data warehouse storage. On the other side, it controls how
customers are sent data. It collaborates with database management systems and grants
permission for data to be properly preserved in repositories. It keeps track of how
information is sent into the staging system and then into the actual data warehouse
storage
Data warehouse tools
* There are many data warehousing tools are available in the market. Here, is some most
prominent one :
1, Mark logic : Data warehousing tool Mark Logic uses a variety of corporate
capabilities to simplify and accelerate data integration, This tool helps in carrying out
TECHNICAL PUBLICATIONS® - an updhrust for inowedgeDate Warehousing
1-17
Introduction to Data Warehouse,
very difficult search procedufes. It can query several sorts of data, including
relationships, documents and metadata.
2. Oracle : The major database in the sector is Oracle. There are several data
warehouse systems available, both on-premises and in the cloud. By improving
operational effectiveness, it aids in optimizing the client experience.
3, Amazon RedShift : A tool for data warehouses is Amazon Redshift. All forms of
data may be easily and affordably analyzed using eurrent BI tools and standard SQL.
‘Additionally, it permits the use of query optimization to conduct complicated
searches against petabytes of structured data.
EM Operational Database Vs Data Warehouse
St. Operational database Data warehouse
1. Hiigh - volume transaction processing is OLAP, or ouline analytical processing,
supported by operational systems is often supported by data warchousing
systems. High - volume transaction
processing is supported by operational
systems.
2. ‘Current data are often an issue for The majority of the time, data
operational systems. ‘warehousing solutions focus on
historical data.
Ea “According te need, operational systems Non - volatile data may often be
are mainly updated regularly updated. Once added, litte was altered.
4 Tris created for corporate operations _Itis intended forthe éxamination of
and real - time transactions, business metrics by domain, scope and
characteristics,
5 itis designed to do a limited number of Tt designed to handle heavy
straightforward operations, often workloads and sophisticated, erratic
adding or retrieving a single row per queries that access numerous rows per
table ata time, table
6 Tt employs validation data tables and is No real-time validation is necessary
designed to validate incoming data _—_due to the abundance of reliable,
uring transactions. consistent information.
7. ican accommodate thousands of
clients at once.
TECHNICAL PUBLICATIONS® - on up-hnst for nowedyeData Warehousing 1-18 Introduction to Data Warehouse
* The main goal of early database systems was to satisfy the requirements of operational
systems, which are often transactional in nature.
‘+ There are many different systems, such as general ledgers, accounts payable, financial
management, order processing, order entry and inventory.
* By their very nature, operational systems are largely focused on managing a particular
transaction. When, the consumer, put money into checking account, the banking
operational system is in charge of documenting the transaction to make sure the
associated debit shows up in account record, i.
* One order, one account and one inventory item are often the focus of an operational
system. An operational system often encounters predetermined events that demand for
quick access owing to their nature, Most transactions only include modest quantities of
data,
© The business requirements for an operational system mostly remain constant. The
programme that records the transaction and the application that restricts access to the
data, or the porting side of the banking company, seldom ever change. The data needed
in this kind of system must be up-to-date when a consumer starts @ transaction, A bank
‘must first be confident of your current balance before allowing a withdrawal.
© Data "Warehousing" away from operational systems
* The main idea behind data warehousing is that by isolating the data used for business
analysis from the data used in operational systems, it can be accessible more efficiently.
Over time, many of the factors causing this divergence have changed. In the past, to
reduce the performance effect on operating systems, older systems archived data onto
tapes when it became inactive and several analytical reports ran from these tapes or
‘mirror data sources. .
‘+ With the exception of the fact that they are now taken into more formal consideration
throughout the data warehouse design process, these reasons for separating operational
data from analytical data have not materially altered with the development of data
warehousing systems,
* Many business: analysis procedures have become considerably more intricate and
sophisticated as a result of technological advancements and changes in the nature of
company. Modern data warehousing systems are capable of creating regular reports as
well as more complex online analysis, such as multi - dimensional analysis.
TECHNICAL PUBLICATIONS® - an uptrut for hnowldgeData Warehousing 1-19 Introduction to Data Warehouse
[EEE Data Warehouse Architecture
# In essence, a deta warehouse gives decision - support programs access to past data.
Reporting, OLAP, Executive Information Systems (EIS), and data mining are examples
of such applications.
# A data warehouse, according to W. H. Inmon, the person who first coined the phrase, is
a centralized, integrated collection of information. Integrate here refers to the cleaning
up, merging, and redesigning. Depending on how many systems feed into a warehouse
and how differently they handle comparable information, this may be more or less
ifficut.
© However, the majority of businesses already have information repositories in their
production systems, many of which are centralized.
‘+ Not data warehouses, then. Actually, no,
‘© In terms of function and structure, data warchouses are different from production
databases or Online Transaction Processing (OLTP) systems. A data warehouse is
geared for data retrieval and reporting and is typically a read-only system, as opposed to
an OLTP system, which is created and optimized for data input and updates.
‘© While o data warehouse includes the data utilized for business analysis, an OLTP system
contains the data required for camying out day-to-day business activities. Data
components that may be missing or unknown at the time of input are common in current
and highly volatile OLPT data. Historical, nonvolatile data that has been corrected for
transaction mistakes may be found in a warehouse. OLPT systems and data warehouses
require various data - modeling techniques since their goals are so dissimilar. Due to the
difficulty of updating OLTP systems with redundant data, redundancy is almost
nonexistent. As a result, OLPT systems are often built using a relational architecture and
are highly normalized. Redundancy, however, is preferred in data warehouses because it
facilitates user access and improves performance by reducing the number of tables that
‘must be connected.
© Some data warehouses choose @ multidimensional architecture over a. relational
approach altogether.
«© An adequate data model is required to discuss data warehouses and differentiate them
from transactional databases, The OLAP and decision - support technologies match the
‘multidimensional data model well. A data warehouse is commonly a repository of
integrated data from several sources that has been processed for storage in a
multidimensional model, as opposed to multi - databases, which provide access to
disconnected and typically heterogeneous databases. Data warehouses often offer
TECHNICAL PUBLICATIONS?
n pst for Arowed2Data Warehousing 1-20 Introduction to Data Warehouse
time - series and trend analysis, which both ask for more historical data than is
commonly kept in transactional databsses. This is in contrast to the majority of
transactional databases. Data warehouses are nonvolatile in comparison to transactional
databases, As a result, the data Warehouse's information changes much less often and
‘may be thought of as non-real-time with periodic updates.
‘* Transactions are the unit and the driving force behind database change in transactional
systems; by contrast, data warehouse information is considerably more coarsely grained
and is updated in accordance with a well - chosen reffesh strategy, which is often
incremental. The warehouse's acquisition component, which offers all necessary
reprocessing, manages warehouse updates.
* A set of decision support tools that are intended at assisting knowledge workers
(executive, manager and analyst) in making quicker and better choices is another way to
define data warehousing in general. An overview of a data warchouse's conceptual
framework may be seen in the accompanying Fig. 1.9.1 It displays the whole data
warehousing procedure. Before data is stored, this procedure may include cleaning and
reformatting. Following the procedure, OLAP, data mining, and DSS may provide fresh,
relevant data, such as rules; this data is shown in the image. Returning to the warehouse
figure demonstrates that files may be used as data sources.
Back fushing
Data werehouse
Data
‘Cleaning —= Reformatting = p——— na
Databases
moh
(Other data inputs
Updates /
new data
Fig. 1.9.41
* Apart from being trendy terms in the current IT sphere, Data Warehousing, Online
Analytical Processing (OLAP), and Decision Support Systems are the anticipated
outcomes of IT systems and present requirements. Long before the term “online
transaction processing" (OLTP) was coined, information management systems were
only concemed with collecting and storing data related to routine, basic transactions in
database management systems.
TECHNICAL PUBLICATIONS? «an upshrust for owedDeta Warehousing 1-21 Introduction to Data Warehouse.
‘¢ Managers and analysts must now go beyond the basic data storage stage and take use of
IT systems by presenting complicated queries, receiving analytical findings, and making
choices based on those results. The introduction of OLAP and Data Warehousing at this
point provides the corporate world with the system design, guiding principles,
methodological approach and - at long lest - tools required to support the presentation of
useful Decision Support Systems.
* LM. adopted the design and technique shown in the following image after close
collaboration with the academic community, which had just lately followed the
development of the commercial sector, which had been advancing and pioneering in the
field for the previous ten years. This is the output of the "Foundations of Data
Warehouse Quality - DWQ” Basic Research project, which was financed by
ESPRIT.Apart from being trendy terms in the current IT sphere, Data warehousing,
Online Analytical Processing (OLAP), and Decision Support Systems are the
anticipated outcomes of IT systems and present requirements, Long before the term
“online transaction processing" (OLTP) was coined, information management systems
were only concemed with collecting and storing data related to routine, basic
transactions in database management systems.
‘¢ Managers and analysts must now go beyond the basic data storage stage and take use of
IT systems by presenting complicated queries, receiving analytical findings and making
choices based on those results. The introduction of OLAP and data warehousing at this
point provides ‘the corporate world with the system design, guiding principles,
methodological approach, and - at long last - tools required to support the, presentation
of useful decision support systems.
© A data warehouse, also known as an OLAP system, is created by applying data
‘warehousing principles to conventional database systems and employing the right
design tools since it is fundamentally reliant on architecture. The design and
implementation of data warehouses and OLAP. systems follow the IMF's established
‘approach.
‘© Utilizing specialist data warehouse and OLAP systems, such as MicroStrategy's DSS
Series, the final deployment is carried out. One of the most well - known and respected
worldwide leaders in the field of data warehousing tools and systems, Micro strategy
Inc. provides solutions for every tier of the DW architectural hierarchy.
© The data warehouse architecture is a method for defining the overall architecture of data
exchanges, processing and presentation that exists for end - client computing inside the
organization. Despite the fact that every data warehouse is different, they nonetheless
"TECHNICAL PUBLICATIONS® - an up-tst fer knowledgeData Warehousing 1-22 Introduction to Data Warehouse
share a few crucial components. Online transaction processing (OLTP) is used in
production systems including payroll, accounts payable, product buying and inventory
‘management.
‘¢ These programmes gather extensive data on everyday operations.
+ Data warehouse systems are designed to assist online analytical processing (OLAP), a
‘more modem activity. These incliide forecasting, trend analysis, profiling, and summary
reporting tools.
© Production databases are routinely updated using OLTP software or manually. A
warehouse database, in contrast, gets regular updates ftom operating systems, usually
after business hours. As OLTP data accumulates in production databases, itis frequently
extracted, filtered and put onto a dedicated warehouse server that is accessible to users.
As the warehouse is filled with data, tables must be de-normalized, data must be
cleansed of errors and duplication and new fields and keys must be added to match the
user's requirements for sorting, combining and averaging data.
Users
Sources
Analysis
‘Operational
system
‘Mining
Fa fies
Fig. 1.9.2 Data warchouse architecture : basic
Principles of a data warehousing
Load efficiency
* Data warehouses need to continuously load new data within constrained time frames.
The speed of the load process should be measured in gigabytes and hundreds of millions
of rows per hour and it shouldn't artificially limit the amount of data that can be
processed.
TECHNICAL PUBLICATIONS? «en up-tvust fr krowisdgeData Warehousing 1-23 Introduction to Data Warehouse
Processing of loads
‘¢ To add new or updated data to the data warehouse, a number of processes must be
completed, including data conversion, filtering, reformatting, indexing and metadata
updating.
‘© Fact - based management requires the greatest level of data quality.
© Despite "dirty" sources and a large database, the warehouse must provide local
consistency, global consistency and referential integrity.
© Query performance The performance of the data warehouse RDBMS must not slow
down fact - based management; huge, sophisticated queries must be finished in seconds,
not days.
‘Scalability of terabytes
‘© The size of data warehouses is expanding at startling rates. These days, there are data
‘warehouses that are a few megabytes to hundreds of terabytes in size.
[EEX Types of Data Warehouse Architecture
4. Three-tier architecture
‘The three-tier architecture is composed of the source layer, which comprises several source
systems, the reconciliation layer and the data warehouse layer, which contains data
warehouses and data marts. The reconciliation layer sits between the data warehouse and the
source data. “
The main advantage of the reconciled layer is that it creates a consistent reference data
model for the whole business. Additionally, it distinguishes between issues with data
‘warehouse filling and those with source data extraction and integration. In some instances, the
reconciled layer is also used directly to improve the performance of specific operational tasks.
Examples include producing daily reports that camnot be adequately prepared using corporate
applications or creating data flows to regularly feed external processes in order to benefit from
cleaning and integration.
This approach is especially useful for big, enterprise-wide systems. One disadvantage of
this setup is the extra file storage space needed for the duplicate reconciling layer that is
added. As a consequence, the analytical tools are a bit less real - time,
TEGHNIAL PUBLICATIONS? - an up-rust for knowledgeData Warehousing 1-24 Introduction to Data Warehouse
ies aa Sarnia
i det)
Au He
Output
eee ee
on
sat
Seay
—— SS Boone,
one
s os
—
eae ef
as
Cees wee
Fig, 1.9.3 Three-tier architecture
* Data warchouses usually have a three ~level (tier) architecture that includes :
1, Bottom tier (Data warehouse server)
2, Middle tier (OLAP server)
3. Top tier (Front end tools).
* The foundational element of the system is a data warehouse server running an RDBMS.
Itcould include a repository for metadata and many specialized data marts.
* Data is retrieved from operational databases and other sources (such as user profile
information provided by extemal consultants) ‘through application programmed
interfaces called gateways. SQL code for server execution may be created’ by client
programmers using a gateway provided by the underlying DBMS.
+ An OLAP server in the intermediate layer for rapid data warchouse queries.
* In order to construct the OLAP server,
1) A Relational OLAP (ROLAP) paradigm, which transforms multidimensional data
fanctions into standard relational procedures, is an improved relational database
‘management s
tem,
TECHNICAL PUBLICATIONS® ~an up-thrust for knowledge,Data Warehousing 4225 Introduction to Data Warehouse
2) A Multidimensional OLAP (MOLAP) model, a specialized server that natively
handles multidimensional processes and data.
'A top-tier that contains front - end tools for displaying results from OLAP as well as tools
for data mining the data created by OLAP.
‘© Information that defines DW items is kept in the metadata repository. For middle - tier
and top-tier applications, it contains the following specifications and data
1. An explanation of the data warehouse structure, including the locations, contents,
hierarchies and warehouse schema.
2. Operational metadata, which generally indicates if a piece of data is active, archived,
or deleted, as well as warehouse monitoring information like consumption
information, error reports, audit findings, ete.
3. Index - containing system performance information is used to speed up data access
and retrieval.
Operational databases, which provide details on data cleansing and transformation
methods, mapping for source RDBMSS, ete.
4, Business information, such as ownership. details, terminologies and pre-configured
queries and reports.
4. Top-down approach :
External sources
Staging area
Fig. 1.9.4 Top - down approach
‘© The following ié a discussion of the crucial elements
© External sources - A location where data of any type is obtained is an ‘external souree.
‘Unstructured, semi - structured, or both types of data are all possible.
TECHNICAL PUBLICATIONS? - on up-brust for krowiedo®Data Warehousing 1-26 Introduction to Date Warehouse
‘Stage area - The data must be checked before being imported into the data warehouse
since it was obtained from external sources and does not follow a certain format. ETL
software should be used to do this
© E(Extracted) : Data was obtained from a third - party data source.
* Data is transformed, or T (Transform), into the approved format.
* Data is delivered into the data warehouse after being translated into the common format.
‘+ Data - warehouse - After being cleansed, data is stored in the data - warehouse as a
central repository. While it really stores the meta data, the true data is saved in data
‘marts. Remember that this top - down approach preserves the data in the data warehouse
in its most fundamental configuration.
* Data marts ; The data mart is a part of the storage component. It keeps track of
information for a single authority - managed organizational function. An organization
may have any number of data marts, depending on the functions. We may also say that
‘the data mart contains a fraction of the data stored in the data warehouse,
+ Studying the enormous amount of data kept in a data warehouse is known as data
mining. It is possible to find hidden patterns in databases or data warehouises by using a
data mining programmed,
‘* When the whole data warehouse has been completed, data marts are created from it,
according to Inmon, who claims that this technique leverages the data warehouse as the
organization's primary repository.
2. Bottom - up approach
External sources
Staging area
@©—]=
©
© Similar to a top - down approach, the data is initially obtained from external sources.
pemunont[
Be &
Fig. 1.8.5 Bottom - up approach
TECHNICAL PUBLICATIONS? - an up-thrust for owedData Warehousing 1-27 Introduction to Data Warehouse
‘* Following the staging area (as previously mentioned), the data are then loaded into data
‘arts rather than data warehouses. The data marts, which enable reporting, were created
initially. It concentrates on a certain industry.
‘Then, these data marts are linked to the data warehouse,
# According to Kinbal’s description of this approach, data marts are constructed initially
to provide analysts a limited viewpoint and a data warehouse is developed when
complete data marts have been established,
Autonomous Data Warehouse
* Oracle Cloud offers a suite of data management services based on self - driving Oracle
Autonomous Database technology. These services iriclude automating all necessary
database maintenance processes while the system is running, including patching,
upgrading and tuning.
Modern Data Warehouse
© A contemporary data warehouse is a technique for acquiring and storing the data that is
cloud - based. Organizations may analyze this data to help in decision - making. As a
result, many companies utilize contemporary data warehouses to improve their
operational, administrative and financial procedures. Quality cloud - based warehousing
departments need this information to make better decisions.
© Data analytics, reporting, data mining, machine learning and other Business Intelligence
(BD procedures may be performed using a data warehouse, a central data management
system that collects, stores and mixes data from many sources within an organization.
© To allow effective analysis that is accessible to everyone within an organization, the
data warehousing system collects, arranges and analyses data fiom many sources.
© Data warehouses have been existed since the 1980s, but as big data has become more
prevalent, they have seen significant transformation recently. Data warehouses are
progressively including tools for sophisticated analytics and data visualization.
1. Elements of a modern data warehouse
‘© There are several major components to today's data warehouses.
Infrastructure :
‘© In the past, businesses have switched their data from databases'to file systems to save
money. According to EMA's Santaferraro, they are now switching from file systems to
object storage.
TECHNICAL PUBLICATIONS®- an uptrust for knowledgeDsta Warehousing 1-28 Introduction to Data Warehouse
‘+ It's important to keep in mind that inexpensive storage has its limits in the area of
analytics, according to Santaferraro. "Cheap is not enough if the data are not available
for analysis."
'* Due to this, the UAW must provide a comprehensive and unified set of analysis
capabilities across all storage levels. When necessary, he added, more sophisticated
UAWs would automate the transfer of data into and out of file systems and object
storage.
Tools :
‘© Although many IT professionals link Hadoop with a data lake, there are numerous more
\widely used solutions that are generally free source. These consist of :
© Akey- value columnar database and storage solution called Apache HBase
© Astorage, table and metadata management solution called Apache HCatalog
©” Large datasets are often processed using the scalable data processing technology
Hadoop MapReduce.
© A MapReduce - based open - source language called Apache Hive that facilitates
the study of big datasets
© AMapReduce task scheduling tool called Oozie
© Apache Pig is a concurrent data processing language associated with MapReduce.
© Ahhierarchical key - value store for synchronization called Apache ZooKeeper
Cloud, multi-cloud and hybrid solutions :
‘* A ccontemporary data warehouse must enable cloud - based platforms as they are used by
the majority of organizations nowadays to store at least part of their data. To allow these
cloud platforms to exchange data, data warehouse should also offer cloud - to -cloud
interoperability. A contemporary data warehouse should provide interoperability across
various on - premises and cloud systems, enabling them to cooperate without isolating
data on any of the aforementioned platforms.
‘© As businesses look for solutions to reunite structured warehouse data with unstructured
data in the data lake, cloud platforms for UAWs have grown in popularity.
© There is now more interest in a UAW strategy as a result of the transition by many
organizations to a software - as - a - service model for corporate applications. Greater
scalability, agility, cost savings, quicker deployments, less complicated disaster
recovery and enhanced govemance and security capabilities are just a few of the
advantages that may result.
TECHNICAL PUBLICATIONS®- an upstust for knowledgeData Werehousing 1-29 Introduction to Data Warehouse
Computing and processing :
© To maximize an organization's infrastructure expenditures, certain cloud - based designs
that enable the contemporary data warehouse totally segregate computational activities
from storage. The flexibility to query data from any storage tier and complete isolation
may result in significant reductions in total cost of ownership.
‘© That is partially due to the fact that cloud suppliers often charge more for computation
than for storage (and naturally, the storage is required just for the compute - intensive
‘analytics procedures). So, if compute capacity can be reduced when not required, teams
may use simply the storage capacity and save money. Again, when computing capacity
is needéd by workloads, it may be dynamically spun up.
© Multi - tiered data storage is advantageous for analytical applications that need a lot of
data, The most cutting - edge systems provide complicated data types high -
performance methods in their native format.
[EA Modern Data Warehouse's Key Features
‘© In response to the advancement of cloud technology, which offers built-in scalability,
high availability, performance and flexibility, data warehousing has undergone
‘tremendous transformation during the last ten years.
‘* Conventional on - premises data warehouses may still be able to assist an organization
in achieving its objectives, but they are too costly and unable to manage the volume of
data that an organization is continually creating.
© A modem data warchouse allows to combine any sort of data, at any scale and instantly
provide business intelligence insights for all users. These tools include dashboards,
visualization tools and sophisticated analytics.
© A modem data warehouse likewise prioritizes value generation above transaction
activities and is mainly built for analytical needs.
‘Snowflake : A Different Data Warehouse Architecture
© The star schema is analogous to a snowflake schema. If one or more dimension tables
must link via other dimension tables in order to connect to the fact table, the schema is
said to be a snowflake.
© Each point in the star bursts into additional points in ‘the snowflake schema, which is an
‘extension of the star schema, The reason why it is termed the snowflake schema is
because the graphic looks like a snowflake. A technique for standardizing the dimension
tables in a STAR schema is called snow flaking. When all of the dimension tables are
TECHNICAL PUBLICATIONS® - an up-trust for knowledgeData Warehousing 1-90 Introduction to Data Warehouse
completely normalized, the resulting structure looks like a snowflake with the fact table
in the center,
* The development of the performance of certain queries uses snow flaking, Each fact in
the schema is surrounded by the relevant dimensions that go with it, and those
dimensions are connected to other dimensions in a snowflake - like pattern.
‘* One fact table makes up the snowflake schema, which is connected to several dimension
tables via a many - to - one connection and further connected to further dimension
tables. The third normal form is often used to normalize tables in a snowflake structure.
A hierarchy is performed by each dimension table at precisely one level.
*. The Snowflake Data Cloud includes a SQL data warchouse that was totally created in
the cloud. It was designed with a revolutionary, patented architecture to handle all
components of data and analytics, which enables it to combine high speed, high
concurrency, simplicity and affordability to levels not imaginable with conventional
data warehouses.
* Snowflake theoretically integrates storage, compute and services (such as user
‘management and metadata management), but keeps them physically separate. Due to the
fact that they are all individual components that may be expanded and contracted
independently, Snowflake is more responsive and. adaptable.
* All compute nodes have access to Snowflake's shared persistent data store. Although.
this is similar to shared - nothing architecture, Snowflake handles queries using MPP
(Massively Parallel Processing) compute clusters. In this architecture, each node in the
cluster stores locally a portion of the whole data set.
* Snowflake may serve as data lake while maintaining at-cost cloud data storage costs. A
Snowflake data lake can natively ingest and query a variety of various data types,
including JSON, CSV, tables, Parquet, ORC and more, with full transactional ACID
integrity in a relational structure.
* The Snowflake platform includes a data lake, data sharing and collaboration, data
marketplace, elastic infrastructure and interfaces for projects including data engineering,
data application development, data science and Al and ML.
* Organizations can focus on using data rather than managing it since Snowflake is a true
data platform-as-a-service because it automatically handles infrastructure, optimization,
infrastructure, data security and availability.
TECHNICAL PUBLICATIONS®- an upstrst fer knowledgeData Warehousing 4-31 Introduction to Data Warehouse
Benefits of the Snowflake schema
© The growth in query efficiency brought about by reduced disk storage needs and
connecting smaller lookup tables is the main benefit of the snowflake structure.
In the link between dimension levels and components, it offers higher scalability.
‘© There is no redundancy, making maintenance simpler.
‘A drawback of the Snowflake Schema
‘© The snowflake schema's biggest drawback is the increased maintenance requirements
brought on by the growing number of lookup tables. Another name for it is a multi-fact
star schema.
«There are more difficult-to-understand inquiries that are more sophisticated.
«More joins and tables result in longer query execution times.
T. Define data warehouse. Explain importance of data warehouse.
(Refer sections 1.1 and 1.4)
2. Explain in details different types of data warehouse tools, (Refer section 1.3)
3. Differentiate between operational database Vs data warehouse, (Refer section 1.8)
ca
4, What are the different components of data warehouse explain in detail.
(Refer section 1,7)
5. Explain source data component and data stagging component.
(Refer sections 1.7.1 and 1.7.2)
6. Explain in detail data warehouse architecture. (Refer section 1.9)
7. Explain different types of data warehouse architectures in detail. (Refer section 1.
8. Explain in detail autonomous data warehouse. (Refer section 1.10)
‘Two Marks Questions with Answers
Q.1 What Is data warehousing 7
Ans.
support management's decisions." A Data Warehousing (DW) process is used to gather and
manage data from many sources in order to provide pertinent business information, Business
data from many sources is routinely linked together and analyzed using a data warehouse,
"Data Warehouse is a subject-oriented, integrated and time-variant store of information to
TECHNICAL PUBLICATIONS® - an upthrust or nowedyeData Warehousing 1-92 Introduction to Data Warehouse
Q2__ What are the different tools used for the data warehousing?
‘Ans. : The tools that enable proper sourcing of data formats and contents from extemal data
storage into the data warehouse must carry out a number of crucial functions, including :
‘* Integration and consolidation of data,
‘© Data translation between different forms.
* Based on the function of business rules that need transformation, data transformation and
computation,
© Synchronization and maintenance of metadata, which includes archiving or updating
information about events, loading formats, transformation activities and source files
Q3 Why data warehousing is needed 7
Ans. : The following justifies the need for a data warehouse :
1, Business User : To access historical data summaries, business users need a data
warehouse. These non-technical persons could just understand the information,
2. Archive historical data : A data warehouse is required to archive historical time-
variable data. This input is made to do a number of different tasks.
3. Make strategic choices : Several approaches may be based on the data in the data
warehouse, Data warehouse aids in strategic decision-making as a result.
4, For uniformity and consistency in data : The user may effectively work to give
uniformity and consistency in data by merging data from multiple sources.
5. Quick reaction times : Data warehouses need to be flexible and quick to respond
in order to handle erratic loads and query patterns.
Q4 Enlist the component of data warehouses ?
‘Ans. : Users may access the converted data in the data warehouse using business intelligence
tools, SQL clients and spreadsheets once the data has been changed, transformed and ingested.
Data from several sources is merged in a data warehouse to produce a large database.
Central database : The data watehouse is « database. They were typically local or cloud-based
‘common relational databases. However, in-memory databases are quickly gaining acceptance due to
Big Data, the need for true, real-time performance and a sharp drop in RAM prices.
Data integration : Several data integration methods are used to extract data from source systems,
change it and align it. for easy analytical consumption. ETL (extract, transform, load) and ELT
procedures are among them, as well as real-time data replication, bulk-load processing, data
twansformation and services for data quality and enrichment.
TECHNICAL PUBLICATIONS? - en up-truat for krowiadgeData Warehousing 4-93. Introduction to Data Warehouse
Meta Data : Data about our data is known as metadata. All of the characteristics of the data sets
in data warehouse-their origin, purpose, values and other details-are recorded. Technical metadata
describes where and how data should be stored, as well as how to get it, Business metadata provides
the context of daa.
Q5 What aro uses of operational database ?
‘Ans.
‘© High-volume transaction processing is supported by operational systems.
© Current data are often an issue for operational systems.
# According to need operational systems are mainly updated regularly.
«It is created for corporate operations and real-time transactions.
«It is designed to do a limited number of straightforward operations, often adding or retrieving.
a single row per table at atime,
It employs validation data tables and is designed to validate incoming data during
transactions
«© Itcan accommodate thousands of clignts at once.
Q6 What are the benefits of data warehousing ?
Ani
© Recognize market trends to enhance our forecasts.
«© Data warehouses are designed to manage enormous amounts of data.
End users can navigate, understand and query data warehouse architectures more easily.
© Complex queries that would be challenging to build and handle in many normalised databases
could be made easier by data warehouses.
‘© Data warehousing is a useful method for managing the demand for a lot of information from a
lot of customers.
Q.7 What are the sources of data warehousing ?
Ans.
© Produetion data : These kinds of data come from the many operating systems used by the
company, Based on the data requirements in the data warehouse, we choose data segments
from the various operational modes.
‘© Infernal data : Databases, reports, customer profiles and occasionally even spreadsheets are
kept "private" by each client organization. This is intemal data, some of which may be
included into a data warehouse,
Lo
TECHNICAL PUBLICATIONS® -an up-trust fr kronfedgeData Warehousing 4-34 Introduction to Data Warehouse
* Archived information : Operational systems’ main objective is to manage the current
business. Every operating system has old data that we regularly erase and store in achieved
files.
+ External information : For the information they use, the majority of CEOs rély extensively
(on data from outside sources. They make use of statistics produced by a different department
that are particular to the sector.
Q8 Enlist the types of data warehouse architecture.
‘Ans. : Three tier architecture :
1. Bottom tier (Data warehouse server)
2, Middle tier (OLAP server)
3. Top tier (Front end tools).
goa
TECHNICAL PUBLICATIONS® - an upshrust or knowledge