0% found this document useful (0 votes)
30 views34 pages

Data Warehousing Notes

Data warehousing notes unit 1 full notes for 3 rd year students

Uploaded by

hariprasath4627
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
30 views34 pages

Data Warehousing Notes

Data warehousing notes unit 1 full notes for 3 rd year students

Uploaded by

hariprasath4627
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 34
UNIT I Introduction to Data Warehouse Syllabus Data warehouse Introduction - Data warehouse components- operational database Vs data warehouse ~ Data warehouse Architecture ~ Three-tier Data Warehouse Architecture - Autonomous Data Warehouse- Autonomous Data Warehouse Vs Snowflake - Modern Data Warehouse. Contents 14 12 18 14 18 16 17 18 19 Introduction History of Data Warehouse Data Warehouse Tools Need for Data Warehouse Benefits of Data Warehouse Data Warehouse Working ? Data Warehouse Components Operational Database Vs Data Warehouse Data Warehouse Architecture 1.10 Autonomous Data Warehouse 1.11. Snowflake : A Different Date Warehouse Architecture 1.12 Two Marks Questions with Answers aa Data Warshousing 1-2 Introduction to Data Warehouse Introduction © The business firms often generate billions of bytes of data every day on all parts of company, which contains millions of unique details about their clients, personnel and operations. However, access to this data is severely restricted and sealed up. Executives and decision - makers have access to a very limited portion of the data that is actually gathered, processed, and stored in the company. © A new technology has recently developed from new ideas and tools, enabling all the essential players in an organization to have access to the knowledge they need to thrive in a world that is becoming more and more competitive. "Data warehousing” is the name given to this new technology. I will be talking about the fundamental ideas and vocabulary of data warehousing in this unit, © Our first desktop "What if” processing test was with the Lotus. The goal of a data warehouse is to leverage the information of company has acquired to help it respond more effectively, intelligently, quickly and efficiently. ‘© Modern organizations depend on the efficient gathering, storing and integrating of data from many sources for analysis and insights. These data analytics roles are now vital to cost reduction, revenue growth and profit maximization. Therefore, it is not surprising that both the amount of data that has been collected and analyzed as well as the variety and number of data sources have expanded, * Data-driven enterprises need robust solutions to handle and analyze massive volumes of data throughout their whole organization. These solutions are reliable, scalable and secute enough for regulated industries while supporting a broad variety of data formats and use cases. Any typical database would struggle to meet the needs. The data warehouse may be of assistance here. ‘© A data warehouse is created by combining data from several sources. Ad hoe or planned inquities, analytical reporting and decision - making are made easier. This course uses @ step-by-step technique to teach every fundamental data warchousing theory. ‘* A Relational Database Management System (RDBMS) concept called a data warehouse was created to address the needs of transaction processing systems. Any centralized data store that may be searched for commercial advantages can be broadly characterized as such, It is a database that holds data intended to satiate inquiry - based decision - making, It is a collection of technologies designed to aid in decision - making and empower knowledge workers (executive, manager and analyst). In order to help TECHNICAL PUBLICATIONS? - an upthrust or nowedye Data Warehousing 21:3 Introduction to Data Warehouse ‘corporate leaders consistently organize, comprehend and apply their information to ‘make strategic choices, data warehousing supports structures and tools. «A Data Warehousing (DW) process is used to gather and manage data from many sources in order to provide insightful business information. Business data from many sources is often connected and analyzed using a data warehouse. The central component of the BI system, which is designed for data analysis and reporting, is the data warehouse. The combination of several technologies and elements facilitates the strategic use of data, Large amounts of data are electronically stored by a company and are intended for analysis and inquiry rather than transaction processing, It is a process of converting data into information and promptly making it accessible to people so that it might have an impact. © An OLAP engine, customer analysis tools, an Extraction, Transportation and Loading. (ETL) solution, as well as additional applications that manage data collection and delivery to business users are all included in the data warehouse environment. What is Data Warehouse ? © A data system having the following characteristics might be thought of as a data ‘warehouse : © It isa database that uses information from many apps to-be used for investigative activities, © It allows for reasonably lengthy exchanges between a limited number of ‘customers. © Itcontains both recent and old data to give information a historical context. © Its use involves a lot of reading © There are a couple big tables in there. © "Data Warehouse is a subject - oriented, integrated and time - variant store. of information to support management's decisions." + A Data Warehousing (DW) process is used to gather and manage data from many sources in order to provide pertinent business information. Business data from many sources is routinely linked together and analyzed using a data warehouse. © Data from many sources, including as point-of-sale operations, marketing automation, relationship management and more, is processed and reported on using an enterprise system called a data warehouse. Ad hoc analysis and customized reporting are both TECHNICAL PUBLICATIONS®- an up-tust for rowed Data Warehousing 1-4 Introduction to Data Warehouse, appropriate uses for data warehouses. A data warehouse is a crucial part of business intelligence since it was created to provide a long-term perspective on data throughout time and can store both current and historical data in one location. History of Data Warehouse © Let us first review the historical management schemes of the analysis data and the factors that have led to the evolution of the data warehousing application class, Traditional approaches to historical data ‘© The operational systems and the data they process have received the majority of attention throughout the development of systems, Since it was impractical to maintain data in operating systems permanently, a framework for archiving the data such systems had processed was only conceived of after the fact. Operational systems need performance, but analysis systems need flexibility and a wide range of capabilities. These are the essential needs that distinguish operational systems from analysis systems. Data from legacy systems Over the last three decades, several platforms have evolved with the evolution of computer technology. Business system development was carried out on IBM mainframe computers in the 1970s utilizing programs like Cobol, CICS, IMS, DB2, etc. ¢ The 1980s saw the development of computer platforms like the AS / 400 and VAX / VMS. The client / server architecture, which is still in use today, was introduced by the server platform UNIX in the late 1980s and early 1990s, + The mainframe environment from the 1970s is still used to operate a significant number of commercial applications, despite all the changes in platforms, architectures, tools and technology, The main reason is that through time, these systems have accumulated business knowledge and regulations that are very challenging to transfer to a new platform or application. These are often referred to as legacy systems. Such technologies eventually make the data they store distant and challenging to access. Extracted inform: ion on the desktop © The personal computer has significantly increased in use for business analysis during the last ten years. Many of the tools needed to utilize spreadsheets for analysis and graphical depiction are now available to business analysts. Advanced users will regularly work with and store data retrieved from the historical sources using desktop database tools © The above has the drawback of leaving the data fragmented and focused on very particular requirements. Only the information that each individual user needs has been acquired. The needs of many consumers and uses carinot be met by the extracts. TECHNICAL PUBLICATIONS? - an up-trust for knowledge Data Warehousing 1:6 Introduction to Data Warehouse ‘Addressing the needs of a single user requires a lot of effort and money. The drawbacks experienced prompted the creation of the new application known as data warehousing. Factors, which lead to data warehousing * The data warehousing field has quickly developed due to a variety of variables. The development of hardware and software technologies has been the most crucial element. * Costs of hardware and software Prices for hardware and software have significantly decreased, Cheaper memory chips with a higher capacity are readily accessible. ‘© Strong preprocessors : Preprocessors nowadays are far more powerful than mainframes of the past, such as Pentium III and Alpha processors. « Budget-friendly disks : As hard disk costs decrease, they can now hold hundreds of gigabytes. In the 1970s and the early 1980s, it would have taken an entire roomful of disk drives to store the amount of data that can now be stored on a single disk drive that is just one inch high. ‘© Strong desktop analysis tools desktops can do tasks that the old - school mainframe computers couldn't, including multi - tier computing, client / server architecture, or user - friendly GUI interfaces. «Server software : When compared to earlier versions, server software is now reasonably priced, robust and simple to maintain. ‘© An example of this is Windows NT, which has made the configuration of powerful computers incredibly simple and cheaper. © The most significant factor in the development of data warehouses has been the exponential growth of hardware and sofiware capabilities, together with the accessibility of low - cost, user - friendly reporting and analytical tools. Emergence of standard business applications, '* Users may now get popular business application packages from new suppliers. The German software companies SAP AG, Baan, PeopleSoft and Oracle have all released software suites with varying capabilities but similar funetions. Standard applications are offered by these application suites, which may take the place of the old custom - developed apps now in use. ‘© This is what has caused such programs to become more popular. ‘+ Additionally, compared to mainframes, these applications’ data gathering is far more straightforward. TECHNICAL PUBLICATIONS® - an vptrt for knowledge Data Warehousing 1-6 Introduction to Data Warehouse End - user more technology oriented ¢ The development of a technology - focused business analyst is one of the most significant outcomes of the tremendous investment in technology and march toward the powerful personal computer. Even while technology - focused end users are not always advantageous to all projects, this trend has undoubtedly generated a crop of business analysts who are at the foreffont of technology who are now crucial to today’s businesses. These technologically savvy end customers have typically contributed significantly to the design and implementation of data warehouses. They are now the primary users who are the first to highlight the early advantages of data warchouses. These end users are essential for the growth of the data warehouse model because they teach other users as they gain expertise with the system. ‘© Users may conduct more precise analyses and improve the functioning of their organizations by using the data warchouse. As computers got more advanced and required to process more and larger amounts of data, the necessity to store data developed. But data warehousing is hardly a brand-new concept. Following are some key events in evolution of data warehouse - ‘+ 1960 - The phrases measurements and facts are created in 1960 as a result of a cooperative research initiative between Dartmouth and General Mills ‘+ 1970 - Retail sales dimensional data marts are introduced by Nielsen and IRI. «1983 - A system for managing databases created with decision assistance in mind is, offered by Tera Data Corporation. © When IBM employees Paul Murphy and Bary Devlin created the Business Data ‘Warehouse in the late 1980s, data warehousing was bom. ‘* However, Inman Bill provided the actual notion. He was revered as the inventor of the data warehouse. He has written on a range of subjects related to the construction, operation and upkeep of the warehouse and the corporate information factory. Characteristics of Data Warehousing © Typically, a data warehouse has four characteristics, according to W. H. Inmon, author of Building the data warehouse and the expert who is usually regarded as the concept's creator ‘© A data warehouse, according to W. H. Inmon, is "a subject - oriented, integrated, nonvolatile, time - variant collection of data to support management's decisions." Access to data for in-depth analysis, knowledge discovery and decision - making is made possible by data warehouses, TEGHNIGAL PUBLICATIONS? - an up-hrst for rowed Data Warehousing 4-7 Introduction to Data Warehouse Subject oriented ‘© The organization of data is subject - based rather than application - based, for example, ‘an insurance business employing a data warehouse would arrange its data by customer, ‘premium and claim rather than by various produets (cat, life, ete.). The subject - specific data only includes the details required for decision support processing. Integrated ‘© Encoding of data is often uneven when it is stored in different financial apps in the operational environment. For instance, although gender may be classified as "m" and "f" in ‘one application, 0 and | in another. When data are transferred from the operational environment to the data warehouse, they are assumed to follow a uniform coding scheme, for example, gender data is changed to"m" and "f." Change in time ‘© Data that are five to ten years old or older may be stored in the data warehouse and utilized for comparisons, trends and forecasts. These statistics are outdated. Non-volatile © Once they are entered into the data warehouse, the data are merely accessed - they are not updated or altered in any way. «These specific qualities define data warehouses. © Acconceptual multidimensional perspective. Universal dimension. Limitless aggregation and dimension levels. 0 0 0 ‘Unrestricted operations across dimensions. Handling of dynamic sparse matrices. ‘The client - server architecture. Multi - user assistance. Availability. Openness 0.000 Intelligent data manipulation, Accurate performance in reporting. 0000 Flexible reporting ‘© Data warehouses are often an order of magnitude (and pethaps two orders of magnitude) bigger than the source databases since they contain massive amounts of data TECHNICAL PUBLICATIONS® - an up-to knonfedy= Data Warehousing 1-8 Introduction to Data Warehouse: Enterprise - wide data warehouses, virtual data warehouses and data marts have all been used to address the problem of the sheer amount of data, which is most likely to be ‘measured in terabytes ‘© Enterprise - wide data warehouses are huge initiatives requiring substantial time and resource investments. + Virtual data warehouses provide users access to materialized views of operational databases. * Data marts are often more narrowly focused and aimed to a certain group of the company, like a dependent. * In order to summarize the aforementioned, the following key elements about distinct data warehouse features should be kept in mind © Subject - focused ‘* Organized around key topics, such customers, products and sales. Concentrating on data modeling and analysis for decision - making rather than routine tasks or transaction processing, * Provide a clear and brief overview of a certain topic by excluding information that is not, relevant to the decision - making process. Integrated ‘Built by combining several, disparate data sources, such as relational databases, flat files and online transaction records. ‘Offering methods for data integration and cleansing. Time variation * Compared to operational systems, the data warehouse has a much longer time horizon. © Every key structure in the data warehouse has an explicit or implicit reference to time. ‘© Non - volatile data that has been changed from the operational environment that is physically stored separately and doesn't need transaction processing, recovery, or concurrency control methods. Only requires the initial loading of the data and the access of the data (no data updates are necessary), Data Warehouse Tools © The tools that enable peroper sourcing of data formats and céntents from external data storage into the data warehouse must carry out a number of crucial functions, including : ‘© Integration and consolidation of data. E TECHNICAL PUBLICATIONS - an up-rus for knowledge Data Warehousing 1-9 Introduction to Data Warehouse © Data translation between different forms © Based on the function of business rules that néed transformation, . data transformation and computation. © Synchronization and maintenance of metadata, which includes archiving or updating infortnation about events, loading formats, transformation activities and source files. ‘© There are several selection criteria which should be considered, while implementing a data warehouse © It is required to be able to recognize the data that the tool can read from the data source environment. © Itis crucial to provide support for flat files, indexed files and older DBMSS. In many setups, the ability to combine records from several data stores is necessary. © Conversation and the specification interface, which indicates the information to be extracted, are crucial © Itis desirable to be able to read data from repository products or data dictionaries. ©The tool's developed code need to be entirely maintained, © Users may extract just the necessary data using selective data extraction for both data items and records. © For the conversion of data into information, a field - level data inspection is required. © When transferring data across disparate systems, the capacity to execute data type and character - set translation is necessary. © Aggregation, summarization and derivation fields and records must be able to be created. © Vendor reliability and product support are important factors that need to be carefully considered. Goals of data warehousing «To support reporting and analysis, «Keep the organization's historical records up to date + Serve as the basis for decision - making. "TECHNICAL PUBLICATIONS® an vptnut for knowledge Data Warehousing 1-10 Introduction to Data Warehouse Need for Data Warehouse ‘© The following justifies the need for a data warehouse 1. Business ser : To access historical data summaries, business users need a data warehouse. These non - technical persons could just understand the information. 2. Archive historical data: A data warehouse is required to archive historical time-variable data. This input is made to do a number of different tasks. 3. Make strategic choices : Several approaches may be based on the data in the data warehouse. Data warehouse aids in strategic decision - making as a result. 4, For uniformity and consistency in data : The user may effectively work to give ‘uniformity and consistency in data by merging data from multiple sources. 5. Quick reaction times : Data warehouses need to be flexible and quick to respond in order to handle erratic loads and query patterns. Benefits of Data Warehouse © Recognize market trends to enhance our forecasts. Data warehouses are designed to manage enormous amounts of data. End users can navigate, understand and query data warehouse architectures more easily. Complex queries that would be challenging to build and handle in many normalised databases could be made easier by data warehouses. Data warehousing is a useful method for managing the demand for a lot of information from a lot of customers. + Data warehousing enables the analysis of a substantial amount of historical data. What Is a Data warehouse used for ? © Here are the industries where data warehouses are most often used © Airline : It is utilised for operational purposes in the airline system, such as personnel assignment, studies of route profitability, frequent flyer programme promotions, ete ‘+ Banking : It is often used in the banking industry to efficiently manage the resources on the desk. A few banks are also utilized for operations, product performance monitoring, and market research. ‘© Healthcare : Data warehouses were also utilized by the healthcare industry to plan and forecast results, provide patient treatment reports and communicate data with affliated insurance firms, medical assistance organizations, ete. TECHNICAL PUBLICATIONS? - an up-tvus fr knowledge Data Werehousing 4-1 Introduction to Data Warehouse '® Government sector ; Data warehouses are utilised for intelligence collection in the public sector. It aids in the upkeep and analysis of each person's tax data and health insurance records by government authorities. Sectors of investment and insurance ‘© In this industry, warehouses are largely used to watch market trends, analyse consumer trends and analyse data patterns. Keep the chain : ‘© Data warehouses are often utilised in retail chains for distribution and marketing. ‘Additionally, it aids in keeping track of products, consumer purchasing trends, promotions and pricing strategy. Telecommunication : ‘© In this industry, distribution choices, sales decisions and product marketing decisions are all made using a data warehouse. The hospitality sector © Based on customer feedback and travel habits, this industry uses warehousing services to plan and predict the locations for its advertising and marketing efforts. Data Warehouse Working ? ‘Steps to implement data warehouse © Using a three - pronged approach, is the best method to reduce the business risk connected with the development of a data warehouse. © Enterprise strategy In this section, we define technical terms such as existing architecture and technologies. Additionally, we define facts, dimensions and characteristics. Additionally passed are data transformation and mapping. © Phased implementation : Data warehouse implementation should be done in accordance with topic areas. Booking and billing are examples of related business entities that should be developed first before being connected, ‘© Iterative prototyping : Developing and testing the data warehouse iteratively is preferable than a big bang approach to implementation. . # ‘One or more data sources provide their data to a data warehouse, which acts as a central store for the data. The transactional system and other relational databases send data to data warehouse. TECHNICAL PUBLICATIONS® - an upstivust for knowledge Data Warehousing 1-12 Introduction to Data Warehouse Data may be : 1. Structured 2. Semi - structured 3. Unstructured data * Users may access the converted data in the data warehouse using business intelligence tools, SQL clients and spreadsheets once the data has been changed, transformed and ingested. Data from several sources is merged in a data warehouse to produce a large database. An organization may analyses its clients more thoroughly by combining all of this data.in one location. This makes sure that all the information is taken into account. Data mining is made feasible by data warehousing. Data mining searches for patterns in the data that might result in increased revenue and profitability. Advantages of data warehouse * Business users may easily access crucial data ftom a variety of sources using data warehouses. + Consistent data on’ multiple cross - functional operations is provided via data warehouse. Ad hoe reporting and querying are also supported. * To lessen the strain on the production system, data warehouses assist in integrating several data sources. ‘Using a data warehouse may speed up analysis and reporting overall. ‘The user may utilize it more easily for reporting and analysis thanks to restructuring and integration, © Users may obtain crucial data from several sources in a single location thanks to data warehouses. As a result, it saves users’ time while obtaining data from various sources. ‘© A substantial quantity of historical data is kept in data warehouses. Users may use this to analyze various historical periods and patterns to forecast the future. Disadvantages of data warehouse © Appoor choice for unstructured data, © The development and implementation of a data warehouse are undoubtedly time- consuming tasks. * Data Warehouse may easily become out of date. © Changes to data types and ranges, data source structure, indexes and searches are challenging. TECHNICAL PUBLICATIONS® - an up-rust for knowledge Data Warehousing 4-13. Introduction to Data Warehouse ‘© Although the data warehouse may seem simple, most consumers would find it to be excessively complicated. © The scope of a data warehousing project will constantly expand, even the finest project management efforts. ‘© Users of warehouses may sometimes create unique business rules. ‘© Organizations must invest a significant amount of their resources in training and implementation. Data Warehouse Components «Users may access the converted data in the data warehouse using business intelligence tools, SQL clients and spreadsheets once the data has been changed, transformed and ingested, Data from several sources is merged in a data warehouse to produce a large database. © Fig. 1.7.1 shows the components of a data warehouse. oS =a Metadata Fig. 1.7.4 ‘© Central database : The data warehouse is a database. They were typically local or cloud - based common relational databases. However, in - memory databases are quickly gaining acceptance due to big data, the need for true, real - time performance and a sharp drop in RAM prices. ‘© Data integration : Several data integration methods are used to extract data from source systems, change it and align it for easy analytical consumption. ETL (Extract, Transform, Load) and ELT procedures are among them, as well as real-time data replication, bulk - load processing, data transformation and services for data quality and enrichment. «Data about our data is known as metadata, All of the characteristics of the data sets in data warehouse - their origin, purpose, values and other details - are recorded, Technical metadata describes where and how data should be stored, as well as how to get it. Business metadata provides the céntext of data. TECHNICAL PUBLICATIONS® - an uphrust for nowedye 4 Introduction to Data Warehouse Data Warehousing © Tools for accessing data warehouse : With the help of these tools, users may interact with the data inside, A few examples of access tools are data mining, OLAP, query and reporting tools and application development tools. The best approach to assemble the pieces is via architecture. We combine hardware and software elements to create a data warchouse, We reserve the right to shift equipment and services to another site, however these structures are built to meet the needs of our organizations. They are all dependent on the situation we are in. / ‘mining Source data Management and conteo 5 : 3 a sS : Ss E = ‘Multi- \-—— | dine _ 0 AW@@ga S =| \e == Report / Query Data staging Fig. 1.7.2 Data warehouse component * The graphic shows the essential elements of a typical warehouse. The source data component is located on the left. The data staging component is the following building element, In the middle is the data storage component, which controls the data in the data warehouses. Along with storing and managing data, this component also manages and tracks data via the metadata repository. The information delivery component on the right displays the many ways that users can obtain data from data warehouses. Source Data Component * Following are the categories used to classify the source data entering the data ‘warehouses : 1. Production data : These kinds of data come from the many operating systems used by the company. Based on the data requirements in the data warehouse, we choose data segments from the various operational modes. 2, Internal data : Databases, reports, customer profiles and occasionally even TECHNICAL PUBLICATIONS® - an uptrut fr inowladgs Data Warehousing 4-15 Introduction to Data Warehouse spreadsheets are kept "private" by each client organization, This is internal data, some of which may be included into a data warehouse 3, Archived information : Operational systems’ main objective is to manage the current business. Every operating system has old data that we regularly erase and store in achieved files. 4, Extemal information : For the information they use, the majority of CEOs rely extensively on data from outside sources. They make use of statistics produced by a different department that are particular to the sector. Data Staging Component ‘© Data must be extracted from several operating systems and extemal sources before being processed for storage in the data warehouse. For the purpose of querying and analysis, it is required to modify, cliange and prepare the data that has been obtained from various sources in an acceptable manner. © The three main tasks performed in the staging area will now be covered. 1) Data extraction : This technique must handle a variety of data sources. Each data source requires a different set of methodologies, which we must use. 2) Data transformation : It is well known that data for a data warehouse comes from a number of sources. For a data warehouse, data transformation poses even more challenges than data extraction. We do a variety of different actions to change data. ‘© First, the information that was obtained from each source is cleansed, When merging data from several sources, eliminating duplicate entries or setting default values for missing data elements are all examples of cleaning. © Standardizing data component components is a crucial part of data transformation. Data transformation may take many different forms when combining information from several sources, We combine relevant data from many sources or data ftom a single source record. # Data transformation, on the opposite hand, also entails separating outsourced records into fresh combinations and deleting pointless source data, Data are massively sorted ‘and combined at the data staging area. When the data transformation function is finished, we get a set of integrated, purified, standardized and summarized data. ‘* Data loading : The two sorts of actions that make up data loading functions are different. When the data warehouse's structure and construction are complete and it first goes online, the data is initially loaded into the storage area of the data warehouse, At first, large volumes of data must be transported, which takes time. TECHNICAL PUBLICATIONS? -an up-tiust for nowedye Data Warehousing 1-16 Introduction to Data Warehouse Data storage hardware * An information split repository is used in data warehousing. Most often, the operating systems data vaults only hold the most current information. These data repositories also provide entirely normalized data that is arranged for quick processing. Information delivery component. © The process of subscribing for data warehouse files and having data transported to one or more destinations in accordance with some scheduling algorithm set by the client is made possible by the information delivery element. Component of metadata + A data warehouse's metadata is equivalent to a database management system's data dictionary or data catalogue. We save details about logical data structures, records and addresses, details about indexes and other details in the data dictionary. Data marts ‘+ Ttcontains a portion of corporate - wide data that is useful to a particular user group. The focus is limited to a few carefully chosen topics. Although advancements in the data warehouse business have made regular and incremental data dumps more feasible, data in a data warehouse should be at least somewhat current but not necessarily up to the ‘minute, Data-marts often have organization and are smaller than data warehouses. A data warehouse should be constructed together with multiple smaller, related data marts for certain types of queries and reporting, according to current data warehousing trends. Management and Control Component * The services and operations within the data warehouse are coordinated by the management and control elements. These parts regulate how the data is transformed and how it is sent to the data warehouse storage. On the other side, it controls how customers are sent data. It collaborates with database management systems and grants permission for data to be properly preserved in repositories. It keeps track of how information is sent into the staging system and then into the actual data warehouse storage Data warehouse tools * There are many data warehousing tools are available in the market. Here, is some most prominent one : 1, Mark logic : Data warehousing tool Mark Logic uses a variety of corporate capabilities to simplify and accelerate data integration, This tool helps in carrying out TECHNICAL PUBLICATIONS® - an updhrust for inowedge Date Warehousing 1-17 Introduction to Data Warehouse, very difficult search procedufes. It can query several sorts of data, including relationships, documents and metadata. 2. Oracle : The major database in the sector is Oracle. There are several data warehouse systems available, both on-premises and in the cloud. By improving operational effectiveness, it aids in optimizing the client experience. 3, Amazon RedShift : A tool for data warehouses is Amazon Redshift. All forms of data may be easily and affordably analyzed using eurrent BI tools and standard SQL. ‘Additionally, it permits the use of query optimization to conduct complicated searches against petabytes of structured data. EM Operational Database Vs Data Warehouse St. Operational database Data warehouse 1. Hiigh - volume transaction processing is OLAP, or ouline analytical processing, supported by operational systems is often supported by data warchousing systems. High - volume transaction processing is supported by operational systems. 2. ‘Current data are often an issue for The majority of the time, data operational systems. ‘warehousing solutions focus on historical data. Ea “According te need, operational systems Non - volatile data may often be are mainly updated regularly updated. Once added, litte was altered. 4 Tris created for corporate operations _Itis intended forthe éxamination of and real - time transactions, business metrics by domain, scope and characteristics, 5 itis designed to do a limited number of Tt designed to handle heavy straightforward operations, often workloads and sophisticated, erratic adding or retrieving a single row per queries that access numerous rows per table ata time, table 6 Tt employs validation data tables and is No real-time validation is necessary designed to validate incoming data _—_due to the abundance of reliable, uring transactions. consistent information. 7. ican accommodate thousands of clients at once. TECHNICAL PUBLICATIONS® - on up-hnst for nowedye Data Warehousing 1-18 Introduction to Data Warehouse * The main goal of early database systems was to satisfy the requirements of operational systems, which are often transactional in nature. ‘+ There are many different systems, such as general ledgers, accounts payable, financial management, order processing, order entry and inventory. * By their very nature, operational systems are largely focused on managing a particular transaction. When, the consumer, put money into checking account, the banking operational system is in charge of documenting the transaction to make sure the associated debit shows up in account record, i. * One order, one account and one inventory item are often the focus of an operational system. An operational system often encounters predetermined events that demand for quick access owing to their nature, Most transactions only include modest quantities of data, © The business requirements for an operational system mostly remain constant. The programme that records the transaction and the application that restricts access to the data, or the porting side of the banking company, seldom ever change. The data needed in this kind of system must be up-to-date when a consumer starts @ transaction, A bank ‘must first be confident of your current balance before allowing a withdrawal. © Data "Warehousing" away from operational systems * The main idea behind data warehousing is that by isolating the data used for business analysis from the data used in operational systems, it can be accessible more efficiently. Over time, many of the factors causing this divergence have changed. In the past, to reduce the performance effect on operating systems, older systems archived data onto tapes when it became inactive and several analytical reports ran from these tapes or ‘mirror data sources. . ‘+ With the exception of the fact that they are now taken into more formal consideration throughout the data warehouse design process, these reasons for separating operational data from analytical data have not materially altered with the development of data warehousing systems, * Many business: analysis procedures have become considerably more intricate and sophisticated as a result of technological advancements and changes in the nature of company. Modern data warehousing systems are capable of creating regular reports as well as more complex online analysis, such as multi - dimensional analysis. TECHNICAL PUBLICATIONS® - an uptrut for hnowldge Data Warehousing 1-19 Introduction to Data Warehouse [EEE Data Warehouse Architecture # In essence, a deta warehouse gives decision - support programs access to past data. Reporting, OLAP, Executive Information Systems (EIS), and data mining are examples of such applications. # A data warehouse, according to W. H. Inmon, the person who first coined the phrase, is a centralized, integrated collection of information. Integrate here refers to the cleaning up, merging, and redesigning. Depending on how many systems feed into a warehouse and how differently they handle comparable information, this may be more or less ifficut. © However, the majority of businesses already have information repositories in their production systems, many of which are centralized. ‘+ Not data warehouses, then. Actually, no, ‘© In terms of function and structure, data warchouses are different from production databases or Online Transaction Processing (OLTP) systems. A data warehouse is geared for data retrieval and reporting and is typically a read-only system, as opposed to an OLTP system, which is created and optimized for data input and updates. ‘© While o data warehouse includes the data utilized for business analysis, an OLTP system contains the data required for camying out day-to-day business activities. Data components that may be missing or unknown at the time of input are common in current and highly volatile OLPT data. Historical, nonvolatile data that has been corrected for transaction mistakes may be found in a warehouse. OLPT systems and data warehouses require various data - modeling techniques since their goals are so dissimilar. Due to the difficulty of updating OLTP systems with redundant data, redundancy is almost nonexistent. As a result, OLPT systems are often built using a relational architecture and are highly normalized. Redundancy, however, is preferred in data warehouses because it facilitates user access and improves performance by reducing the number of tables that ‘must be connected. © Some data warehouses choose @ multidimensional architecture over a. relational approach altogether. «© An adequate data model is required to discuss data warehouses and differentiate them from transactional databases, The OLAP and decision - support technologies match the ‘multidimensional data model well. A data warehouse is commonly a repository of integrated data from several sources that has been processed for storage in a multidimensional model, as opposed to multi - databases, which provide access to disconnected and typically heterogeneous databases. Data warehouses often offer TECHNICAL PUBLICATIONS? n pst for Arowed2 Data Warehousing 1-20 Introduction to Data Warehouse time - series and trend analysis, which both ask for more historical data than is commonly kept in transactional databsses. This is in contrast to the majority of transactional databases. Data warehouses are nonvolatile in comparison to transactional databases, As a result, the data Warehouse's information changes much less often and ‘may be thought of as non-real-time with periodic updates. ‘* Transactions are the unit and the driving force behind database change in transactional systems; by contrast, data warehouse information is considerably more coarsely grained and is updated in accordance with a well - chosen reffesh strategy, which is often incremental. The warehouse's acquisition component, which offers all necessary reprocessing, manages warehouse updates. * A set of decision support tools that are intended at assisting knowledge workers (executive, manager and analyst) in making quicker and better choices is another way to define data warehousing in general. An overview of a data warchouse's conceptual framework may be seen in the accompanying Fig. 1.9.1 It displays the whole data warehousing procedure. Before data is stored, this procedure may include cleaning and reformatting. Following the procedure, OLAP, data mining, and DSS may provide fresh, relevant data, such as rules; this data is shown in the image. Returning to the warehouse figure demonstrates that files may be used as data sources. Back fushing Data werehouse Data ‘Cleaning —= Reformatting = p——— na Databases moh (Other data inputs Updates / new data Fig. 1.9.41 * Apart from being trendy terms in the current IT sphere, Data Warehousing, Online Analytical Processing (OLAP), and Decision Support Systems are the anticipated outcomes of IT systems and present requirements. Long before the term “online transaction processing" (OLTP) was coined, information management systems were only concemed with collecting and storing data related to routine, basic transactions in database management systems. TECHNICAL PUBLICATIONS? «an upshrust for owed Deta Warehousing 1-21 Introduction to Data Warehouse. ‘¢ Managers and analysts must now go beyond the basic data storage stage and take use of IT systems by presenting complicated queries, receiving analytical findings, and making choices based on those results. The introduction of OLAP and Data Warehousing at this point provides the corporate world with the system design, guiding principles, methodological approach and - at long lest - tools required to support the presentation of useful Decision Support Systems. * LM. adopted the design and technique shown in the following image after close collaboration with the academic community, which had just lately followed the development of the commercial sector, which had been advancing and pioneering in the field for the previous ten years. This is the output of the "Foundations of Data Warehouse Quality - DWQ” Basic Research project, which was financed by ESPRIT.Apart from being trendy terms in the current IT sphere, Data warehousing, Online Analytical Processing (OLAP), and Decision Support Systems are the anticipated outcomes of IT systems and present requirements, Long before the term “online transaction processing" (OLTP) was coined, information management systems were only concemed with collecting and storing data related to routine, basic transactions in database management systems. ‘¢ Managers and analysts must now go beyond the basic data storage stage and take use of IT systems by presenting complicated queries, receiving analytical findings and making choices based on those results. The introduction of OLAP and data warehousing at this point provides ‘the corporate world with the system design, guiding principles, methodological approach, and - at long last - tools required to support the, presentation of useful decision support systems. © A data warehouse, also known as an OLAP system, is created by applying data ‘warehousing principles to conventional database systems and employing the right design tools since it is fundamentally reliant on architecture. The design and implementation of data warehouses and OLAP. systems follow the IMF's established ‘approach. ‘© Utilizing specialist data warehouse and OLAP systems, such as MicroStrategy's DSS Series, the final deployment is carried out. One of the most well - known and respected worldwide leaders in the field of data warehousing tools and systems, Micro strategy Inc. provides solutions for every tier of the DW architectural hierarchy. © The data warehouse architecture is a method for defining the overall architecture of data exchanges, processing and presentation that exists for end - client computing inside the organization. Despite the fact that every data warehouse is different, they nonetheless "TECHNICAL PUBLICATIONS® - an up-tst fer knowledge Data Warehousing 1-22 Introduction to Data Warehouse share a few crucial components. Online transaction processing (OLTP) is used in production systems including payroll, accounts payable, product buying and inventory ‘management. ‘¢ These programmes gather extensive data on everyday operations. + Data warehouse systems are designed to assist online analytical processing (OLAP), a ‘more modem activity. These incliide forecasting, trend analysis, profiling, and summary reporting tools. © Production databases are routinely updated using OLTP software or manually. A warehouse database, in contrast, gets regular updates ftom operating systems, usually after business hours. As OLTP data accumulates in production databases, itis frequently extracted, filtered and put onto a dedicated warehouse server that is accessible to users. As the warehouse is filled with data, tables must be de-normalized, data must be cleansed of errors and duplication and new fields and keys must be added to match the user's requirements for sorting, combining and averaging data. Users Sources Analysis ‘Operational system ‘Mining Fa fies Fig. 1.9.2 Data warchouse architecture : basic Principles of a data warehousing Load efficiency * Data warehouses need to continuously load new data within constrained time frames. The speed of the load process should be measured in gigabytes and hundreds of millions of rows per hour and it shouldn't artificially limit the amount of data that can be processed. TECHNICAL PUBLICATIONS? «en up-tvust fr krowisdge Data Warehousing 1-23 Introduction to Data Warehouse Processing of loads ‘¢ To add new or updated data to the data warehouse, a number of processes must be completed, including data conversion, filtering, reformatting, indexing and metadata updating. ‘© Fact - based management requires the greatest level of data quality. © Despite "dirty" sources and a large database, the warehouse must provide local consistency, global consistency and referential integrity. © Query performance The performance of the data warehouse RDBMS must not slow down fact - based management; huge, sophisticated queries must be finished in seconds, not days. ‘Scalability of terabytes ‘© The size of data warehouses is expanding at startling rates. These days, there are data ‘warehouses that are a few megabytes to hundreds of terabytes in size. [EEX Types of Data Warehouse Architecture 4. Three-tier architecture ‘The three-tier architecture is composed of the source layer, which comprises several source systems, the reconciliation layer and the data warehouse layer, which contains data warehouses and data marts. The reconciliation layer sits between the data warehouse and the source data. “ The main advantage of the reconciled layer is that it creates a consistent reference data model for the whole business. Additionally, it distinguishes between issues with data ‘warehouse filling and those with source data extraction and integration. In some instances, the reconciled layer is also used directly to improve the performance of specific operational tasks. Examples include producing daily reports that camnot be adequately prepared using corporate applications or creating data flows to regularly feed external processes in order to benefit from cleaning and integration. This approach is especially useful for big, enterprise-wide systems. One disadvantage of this setup is the extra file storage space needed for the duplicate reconciling layer that is added. As a consequence, the analytical tools are a bit less real - time, TEGHNIAL PUBLICATIONS? - an up-rust for knowledge Data Warehousing 1-24 Introduction to Data Warehouse ies aa Sarnia i det) Au He Output eee ee on sat Seay —— SS Boone, one s os — eae ef as Cees wee Fig, 1.9.3 Three-tier architecture * Data warchouses usually have a three ~level (tier) architecture that includes : 1, Bottom tier (Data warehouse server) 2, Middle tier (OLAP server) 3. Top tier (Front end tools). * The foundational element of the system is a data warehouse server running an RDBMS. Itcould include a repository for metadata and many specialized data marts. * Data is retrieved from operational databases and other sources (such as user profile information provided by extemal consultants) ‘through application programmed interfaces called gateways. SQL code for server execution may be created’ by client programmers using a gateway provided by the underlying DBMS. + An OLAP server in the intermediate layer for rapid data warchouse queries. * In order to construct the OLAP server, 1) A Relational OLAP (ROLAP) paradigm, which transforms multidimensional data fanctions into standard relational procedures, is an improved relational database ‘management s tem, TECHNICAL PUBLICATIONS® ~an up-thrust for knowledge, Data Warehousing 4225 Introduction to Data Warehouse 2) A Multidimensional OLAP (MOLAP) model, a specialized server that natively handles multidimensional processes and data. 'A top-tier that contains front - end tools for displaying results from OLAP as well as tools for data mining the data created by OLAP. ‘© Information that defines DW items is kept in the metadata repository. For middle - tier and top-tier applications, it contains the following specifications and data 1. An explanation of the data warehouse structure, including the locations, contents, hierarchies and warehouse schema. 2. Operational metadata, which generally indicates if a piece of data is active, archived, or deleted, as well as warehouse monitoring information like consumption information, error reports, audit findings, ete. 3. Index - containing system performance information is used to speed up data access and retrieval. Operational databases, which provide details on data cleansing and transformation methods, mapping for source RDBMSS, ete. 4, Business information, such as ownership. details, terminologies and pre-configured queries and reports. 4. Top-down approach : External sources Staging area Fig. 1.9.4 Top - down approach ‘© The following ié a discussion of the crucial elements © External sources - A location where data of any type is obtained is an ‘external souree. ‘Unstructured, semi - structured, or both types of data are all possible. TECHNICAL PUBLICATIONS? - on up-brust for krowiedo® Data Warehousing 1-26 Introduction to Date Warehouse ‘Stage area - The data must be checked before being imported into the data warehouse since it was obtained from external sources and does not follow a certain format. ETL software should be used to do this © E(Extracted) : Data was obtained from a third - party data source. * Data is transformed, or T (Transform), into the approved format. * Data is delivered into the data warehouse after being translated into the common format. ‘+ Data - warehouse - After being cleansed, data is stored in the data - warehouse as a central repository. While it really stores the meta data, the true data is saved in data ‘marts. Remember that this top - down approach preserves the data in the data warehouse in its most fundamental configuration. * Data marts ; The data mart is a part of the storage component. It keeps track of information for a single authority - managed organizational function. An organization may have any number of data marts, depending on the functions. We may also say that ‘the data mart contains a fraction of the data stored in the data warehouse, + Studying the enormous amount of data kept in a data warehouse is known as data mining. It is possible to find hidden patterns in databases or data warehouises by using a data mining programmed, ‘* When the whole data warehouse has been completed, data marts are created from it, according to Inmon, who claims that this technique leverages the data warehouse as the organization's primary repository. 2. Bottom - up approach External sources Staging area @©—]= © © Similar to a top - down approach, the data is initially obtained from external sources. pemunont[ Be & Fig. 1.8.5 Bottom - up approach TECHNICAL PUBLICATIONS? - an up-thrust for owed Data Warehousing 1-27 Introduction to Data Warehouse ‘* Following the staging area (as previously mentioned), the data are then loaded into data ‘arts rather than data warehouses. The data marts, which enable reporting, were created initially. It concentrates on a certain industry. ‘Then, these data marts are linked to the data warehouse, # According to Kinbal’s description of this approach, data marts are constructed initially to provide analysts a limited viewpoint and a data warehouse is developed when complete data marts have been established, Autonomous Data Warehouse * Oracle Cloud offers a suite of data management services based on self - driving Oracle Autonomous Database technology. These services iriclude automating all necessary database maintenance processes while the system is running, including patching, upgrading and tuning. Modern Data Warehouse © A contemporary data warehouse is a technique for acquiring and storing the data that is cloud - based. Organizations may analyze this data to help in decision - making. As a result, many companies utilize contemporary data warehouses to improve their operational, administrative and financial procedures. Quality cloud - based warehousing departments need this information to make better decisions. © Data analytics, reporting, data mining, machine learning and other Business Intelligence (BD procedures may be performed using a data warehouse, a central data management system that collects, stores and mixes data from many sources within an organization. © To allow effective analysis that is accessible to everyone within an organization, the data warehousing system collects, arranges and analyses data fiom many sources. © Data warehouses have been existed since the 1980s, but as big data has become more prevalent, they have seen significant transformation recently. Data warehouses are progressively including tools for sophisticated analytics and data visualization. 1. Elements of a modern data warehouse ‘© There are several major components to today's data warehouses. Infrastructure : ‘© In the past, businesses have switched their data from databases'to file systems to save money. According to EMA's Santaferraro, they are now switching from file systems to object storage. TECHNICAL PUBLICATIONS®- an uptrust for knowledge Dsta Warehousing 1-28 Introduction to Data Warehouse ‘+ It's important to keep in mind that inexpensive storage has its limits in the area of analytics, according to Santaferraro. "Cheap is not enough if the data are not available for analysis." '* Due to this, the UAW must provide a comprehensive and unified set of analysis capabilities across all storage levels. When necessary, he added, more sophisticated UAWs would automate the transfer of data into and out of file systems and object storage. Tools : ‘© Although many IT professionals link Hadoop with a data lake, there are numerous more \widely used solutions that are generally free source. These consist of : © Akey- value columnar database and storage solution called Apache HBase © Astorage, table and metadata management solution called Apache HCatalog ©” Large datasets are often processed using the scalable data processing technology Hadoop MapReduce. © A MapReduce - based open - source language called Apache Hive that facilitates the study of big datasets © AMapReduce task scheduling tool called Oozie © Apache Pig is a concurrent data processing language associated with MapReduce. © Ahhierarchical key - value store for synchronization called Apache ZooKeeper Cloud, multi-cloud and hybrid solutions : ‘* A ccontemporary data warehouse must enable cloud - based platforms as they are used by the majority of organizations nowadays to store at least part of their data. To allow these cloud platforms to exchange data, data warehouse should also offer cloud - to -cloud interoperability. A contemporary data warehouse should provide interoperability across various on - premises and cloud systems, enabling them to cooperate without isolating data on any of the aforementioned platforms. ‘© As businesses look for solutions to reunite structured warehouse data with unstructured data in the data lake, cloud platforms for UAWs have grown in popularity. © There is now more interest in a UAW strategy as a result of the transition by many organizations to a software - as - a - service model for corporate applications. Greater scalability, agility, cost savings, quicker deployments, less complicated disaster recovery and enhanced govemance and security capabilities are just a few of the advantages that may result. TECHNICAL PUBLICATIONS®- an upstust for knowledge Data Werehousing 1-29 Introduction to Data Warehouse Computing and processing : © To maximize an organization's infrastructure expenditures, certain cloud - based designs that enable the contemporary data warehouse totally segregate computational activities from storage. The flexibility to query data from any storage tier and complete isolation may result in significant reductions in total cost of ownership. ‘© That is partially due to the fact that cloud suppliers often charge more for computation than for storage (and naturally, the storage is required just for the compute - intensive ‘analytics procedures). So, if compute capacity can be reduced when not required, teams may use simply the storage capacity and save money. Again, when computing capacity is needéd by workloads, it may be dynamically spun up. © Multi - tiered data storage is advantageous for analytical applications that need a lot of data, The most cutting - edge systems provide complicated data types high - performance methods in their native format. [EA Modern Data Warehouse's Key Features ‘© In response to the advancement of cloud technology, which offers built-in scalability, high availability, performance and flexibility, data warehousing has undergone ‘tremendous transformation during the last ten years. ‘* Conventional on - premises data warehouses may still be able to assist an organization in achieving its objectives, but they are too costly and unable to manage the volume of data that an organization is continually creating. © A modem data warchouse allows to combine any sort of data, at any scale and instantly provide business intelligence insights for all users. These tools include dashboards, visualization tools and sophisticated analytics. © A modem data warehouse likewise prioritizes value generation above transaction activities and is mainly built for analytical needs. ‘Snowflake : A Different Data Warehouse Architecture © The star schema is analogous to a snowflake schema. If one or more dimension tables must link via other dimension tables in order to connect to the fact table, the schema is said to be a snowflake. © Each point in the star bursts into additional points in ‘the snowflake schema, which is an ‘extension of the star schema, The reason why it is termed the snowflake schema is because the graphic looks like a snowflake. A technique for standardizing the dimension tables in a STAR schema is called snow flaking. When all of the dimension tables are TECHNICAL PUBLICATIONS® - an up-trust for knowledge Data Warehousing 1-90 Introduction to Data Warehouse completely normalized, the resulting structure looks like a snowflake with the fact table in the center, * The development of the performance of certain queries uses snow flaking, Each fact in the schema is surrounded by the relevant dimensions that go with it, and those dimensions are connected to other dimensions in a snowflake - like pattern. ‘* One fact table makes up the snowflake schema, which is connected to several dimension tables via a many - to - one connection and further connected to further dimension tables. The third normal form is often used to normalize tables in a snowflake structure. A hierarchy is performed by each dimension table at precisely one level. *. The Snowflake Data Cloud includes a SQL data warchouse that was totally created in the cloud. It was designed with a revolutionary, patented architecture to handle all components of data and analytics, which enables it to combine high speed, high concurrency, simplicity and affordability to levels not imaginable with conventional data warehouses. * Snowflake theoretically integrates storage, compute and services (such as user ‘management and metadata management), but keeps them physically separate. Due to the fact that they are all individual components that may be expanded and contracted independently, Snowflake is more responsive and. adaptable. * All compute nodes have access to Snowflake's shared persistent data store. Although. this is similar to shared - nothing architecture, Snowflake handles queries using MPP (Massively Parallel Processing) compute clusters. In this architecture, each node in the cluster stores locally a portion of the whole data set. * Snowflake may serve as data lake while maintaining at-cost cloud data storage costs. A Snowflake data lake can natively ingest and query a variety of various data types, including JSON, CSV, tables, Parquet, ORC and more, with full transactional ACID integrity in a relational structure. * The Snowflake platform includes a data lake, data sharing and collaboration, data marketplace, elastic infrastructure and interfaces for projects including data engineering, data application development, data science and Al and ML. * Organizations can focus on using data rather than managing it since Snowflake is a true data platform-as-a-service because it automatically handles infrastructure, optimization, infrastructure, data security and availability. TECHNICAL PUBLICATIONS®- an upstrst fer knowledge Data Warehousing 4-31 Introduction to Data Warehouse Benefits of the Snowflake schema © The growth in query efficiency brought about by reduced disk storage needs and connecting smaller lookup tables is the main benefit of the snowflake structure. In the link between dimension levels and components, it offers higher scalability. ‘© There is no redundancy, making maintenance simpler. ‘A drawback of the Snowflake Schema ‘© The snowflake schema's biggest drawback is the increased maintenance requirements brought on by the growing number of lookup tables. Another name for it is a multi-fact star schema. «There are more difficult-to-understand inquiries that are more sophisticated. «More joins and tables result in longer query execution times. T. Define data warehouse. Explain importance of data warehouse. (Refer sections 1.1 and 1.4) 2. Explain in details different types of data warehouse tools, (Refer section 1.3) 3. Differentiate between operational database Vs data warehouse, (Refer section 1.8) ca 4, What are the different components of data warehouse explain in detail. (Refer section 1,7) 5. Explain source data component and data stagging component. (Refer sections 1.7.1 and 1.7.2) 6. Explain in detail data warehouse architecture. (Refer section 1.9) 7. Explain different types of data warehouse architectures in detail. (Refer section 1. 8. Explain in detail autonomous data warehouse. (Refer section 1.10) ‘Two Marks Questions with Answers Q.1 What Is data warehousing 7 Ans. support management's decisions." A Data Warehousing (DW) process is used to gather and manage data from many sources in order to provide pertinent business information, Business data from many sources is routinely linked together and analyzed using a data warehouse, "Data Warehouse is a subject-oriented, integrated and time-variant store of information to TECHNICAL PUBLICATIONS® - an upthrust or nowedye Data Warehousing 1-92 Introduction to Data Warehouse Q2__ What are the different tools used for the data warehousing? ‘Ans. : The tools that enable proper sourcing of data formats and contents from extemal data storage into the data warehouse must carry out a number of crucial functions, including : ‘* Integration and consolidation of data, ‘© Data translation between different forms. * Based on the function of business rules that need transformation, data transformation and computation, © Synchronization and maintenance of metadata, which includes archiving or updating information about events, loading formats, transformation activities and source files Q3 Why data warehousing is needed 7 Ans. : The following justifies the need for a data warehouse : 1, Business User : To access historical data summaries, business users need a data warehouse. These non-technical persons could just understand the information, 2. Archive historical data : A data warehouse is required to archive historical time- variable data. This input is made to do a number of different tasks. 3. Make strategic choices : Several approaches may be based on the data in the data warehouse, Data warehouse aids in strategic decision-making as a result. 4, For uniformity and consistency in data : The user may effectively work to give uniformity and consistency in data by merging data from multiple sources. 5. Quick reaction times : Data warehouses need to be flexible and quick to respond in order to handle erratic loads and query patterns. Q4 Enlist the component of data warehouses ? ‘Ans. : Users may access the converted data in the data warehouse using business intelligence tools, SQL clients and spreadsheets once the data has been changed, transformed and ingested. Data from several sources is merged in a data warehouse to produce a large database. Central database : The data watehouse is « database. They were typically local or cloud-based ‘common relational databases. However, in-memory databases are quickly gaining acceptance due to Big Data, the need for true, real-time performance and a sharp drop in RAM prices. Data integration : Several data integration methods are used to extract data from source systems, change it and align it. for easy analytical consumption. ETL (extract, transform, load) and ELT procedures are among them, as well as real-time data replication, bulk-load processing, data twansformation and services for data quality and enrichment. TECHNICAL PUBLICATIONS? - en up-truat for krowiadge Data Warehousing 4-93. Introduction to Data Warehouse Meta Data : Data about our data is known as metadata. All of the characteristics of the data sets in data warehouse-their origin, purpose, values and other details-are recorded. Technical metadata describes where and how data should be stored, as well as how to get it, Business metadata provides the context of daa. Q5 What aro uses of operational database ? ‘Ans. ‘© High-volume transaction processing is supported by operational systems. © Current data are often an issue for operational systems. # According to need operational systems are mainly updated regularly. «It is created for corporate operations and real-time transactions. «It is designed to do a limited number of straightforward operations, often adding or retrieving. a single row per table at atime, It employs validation data tables and is designed to validate incoming data during transactions «© Itcan accommodate thousands of clignts at once. Q6 What are the benefits of data warehousing ? Ani © Recognize market trends to enhance our forecasts. «© Data warehouses are designed to manage enormous amounts of data. End users can navigate, understand and query data warehouse architectures more easily. © Complex queries that would be challenging to build and handle in many normalised databases could be made easier by data warehouses. ‘© Data warehousing is a useful method for managing the demand for a lot of information from a lot of customers. Q.7 What are the sources of data warehousing ? Ans. © Produetion data : These kinds of data come from the many operating systems used by the company, Based on the data requirements in the data warehouse, we choose data segments from the various operational modes. ‘© Infernal data : Databases, reports, customer profiles and occasionally even spreadsheets are kept "private" by each client organization. This is intemal data, some of which may be included into a data warehouse, Lo TECHNICAL PUBLICATIONS® -an up-trust fr kronfedge Data Warehousing 4-34 Introduction to Data Warehouse * Archived information : Operational systems’ main objective is to manage the current business. Every operating system has old data that we regularly erase and store in achieved files. + External information : For the information they use, the majority of CEOs rély extensively (on data from outside sources. They make use of statistics produced by a different department that are particular to the sector. Q8 Enlist the types of data warehouse architecture. ‘Ans. : Three tier architecture : 1. Bottom tier (Data warehouse server) 2, Middle tier (OLAP server) 3. Top tier (Front end tools). goa TECHNICAL PUBLICATIONS® - an upshrust or knowledge

You might also like