0% found this document useful (0 votes)
98 views10 pages

Enterprise Data Management Guide

This document discusses the need for datamarts and extract, transform, and load (ETL) tools to build integrated enterprise management systems that provide business users access to key information. It explains that while organizations have invested in integrated systems, reporting and analytics capabilities are still limited. Datamarts consolidate relevant data from various sources into a single database to address strategic and tactical information needs. ETL tools are needed to extract, transform and load this data in a consistent way that applies business rules and provides consistent views of the data. The document explores the business and technology drivers for building datamarts and an overview of ETL tools and their capabilities.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views10 pages

Enterprise Data Management Guide

This document discusses the need for datamarts and extract, transform, and load (ETL) tools to build integrated enterprise management systems that provide business users access to key information. It explains that while organizations have invested in integrated systems, reporting and analytics capabilities are still limited. Datamarts consolidate relevant data from various sources into a single database to address strategic and tactical information needs. ETL tools are needed to extract, transform and load this data in a consistent way that applies business rules and provides consistent views of the data. The document explores the business and technology drivers for building datamarts and an overview of ETL tools and their capabilities.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Whitepaper Datamarts, Extraction Tools, and Cognos Data Manager Executive Summary

Businesses today are driven to manage enterprise performance but face the reality that key business information at the enterprise level is still largely inaccessible. Regardless of the fact that organizations have spent countless millions on integrated systems in the past decade, the promise of integrated, enterprise level reporting and analysis capabilities is still largely a myth in corporate America. Providing integrated enterprise management systems for reporting and analytics requires that systems be designed and built in distinctly different ways from systems that capture daily business transactions. A family of tools exists to build these systems that provide the business views that consumers seek. The class of tools; herein referred to as Extract, Transform, and Load (ETL) tools, do much more than simple move data between databases. This paper explores the business and technology conditions which drive the need for integrated enterprise management systems. These systems are often called, data warehouses or datamarts. It also provides an overview of ETL tools, their capabilities, and answers the question, when does my organization need a tool? For the Cognos client, we also explore Cognos Data Manager and explain how Data Manager simplifies managing the production oriented aspects of a typical Cognos implementation.

Terminology
Consumer or Information Consumer. A person, at any level of the organization, who has a need for information. Operational System. A system used primary to capture information necessary to run a business. These systems are designed to capture the transaction rapidly. Leading ERP systems, accounting systems, custom billing systems, etc are all examples of operational systems. Informational System. A system specifically designed for reporting and ad-hoc query. These systems are designed specifically to provide information about the transactions. Source. A data location from which information is read. Target. A data location to which data will be output. Dimension. Structural information that defines a business. As example, locations, business units, products, etc. Fact Table. A table that houses key measures that are stored at their lowest level of detail. Build. A process that specifies input data, processes, and outputs.
Data Warehousing, ETL Tools, and Cognos Decision Stream www.firstquarter.net

page 1

Job. A series of builds that are sequenced and tied together as a unit Data Mapping. How data is organized and named as it flows from source to target. Transformation. The intermediate processing that is performed on data as it is moved from source to target.

Conditions That Drive Datamart Projects


Business Issues
At a strategic level the need to effectively manage performance drives the need for datamarts in all organizations. Throughout the past decade corporations both large and small alike bought off on the promise that having an integrated system would provide an excellent reporting solutions. In this case, integrated includes ERP systems, best-of-breed financial systems, packaged software systems, and custom build applications. Many believed that their new systems would streamline processes, integrate data, and provide instant management reports. In many cases, the unfortunate truth was that the promise was a myth. While the new systems streamlined processes and integrated data, seldom did they provide the reporting that was so important in the selection process. The reality was that the company now had a large data asset With many integrated systems now installed, we find that consumers still need access to the information which drives the business. Consumers still seek a single version of the truth to answer the key business questions which must routinely be addressed. Strategically there is a need to organize information into integrated enterprise management systems and make it generally available to consumers. At a tactical level organizations also face a variety of issues that surface when interviewing users about their daily routine. These issues are addressed as questions below. 1. Do I spend more time collecting and preparing data than I do analyzing it? 2. Do meetings sometimes stall because of a disagreement between two or more people about the validity of numbers? 3. Can I access both summary and detail information quickly from within the same application? 4. Am I ever frustrated by my lack of ability to access data that I know exists within my organization? 5. Do I or my staff spend time rekeying data from reports into databases or spreadsheets? 6. It is impossible to predict all of the reports I will ever need?
Data Warehousing, ETL Tools, and Cognos Decision Stream www.firstquarter.net

page 2

7. Do I currently spend time trying to interpret data in reports? 8. Can I easily retrieve and review information that is critical to the success of my functional area?

While predicting the variety of ways that data must be formatted and presented is impossible, one can predict the universe of data needed to answer key business questions.
Enter the data warehouse or datamart technology buzzwords which define both the universe of data and the computer location where answers to key business questions are stored. Its goal is to improve business analysis and decision making capability. For the purpose of this document, we will use the term datamart to define specific data subject areas (i.e., finance, sales, marketing, purchasing, etc.) which are stored in a database. The datamart: holds information which addresses both strategic and tactical information needs. provides information which allows key operating functions to effectively manage performance. unifies information from various databases into a single database. By unifying the information, consistent business views may be produced. Beyond storing information of interest to consumers, corporations also need to address their business reporting needs in a consistent manner. Organizing information in a datamart addresses half of the data availability issue. The remaining data availability issues are addressed by providing access tools for reporting, analytics, and ad-hoc query. These access tools fall under a banner commonly referred to as Business Intelligence. Business Intelligence will not be discussed in detail here; however, it is strategically equally as important as is organizing information into a datamart. From a business perspective the need for the datamart is driven largely by the need to manage performance. Whether managing profit loss, reviewing last months sales, increasing cross-selling capability, or structuring a volume purchasing agreements these examples all highlight an organizations need to provide relevant and timely access to business information. The need to consolidate information after a merger can also drive an increase in datamart usage.

Technology Issues
When looking through a prism to understand technological reasons for datamarts, the reasons focus to a single point: consistently managing information. Datamarts provide a consistent means of addressing the following: operational vs. informational data business rules consistent business views production windows information availability

The remainder of this section provides additional detail to the above points. Operational vs. Informational Data. It is commonplace for operational systems to fall short when it comes to reporting on informational data. This is because operational systems are designed with rapid data entry in mind. These systems capture, the transaction in real-time, are designed for high throughput, and
Data Warehousing, ETL Tools, and Cognos Decision Stream www.firstquarter.net page 3

require high availability. By contrast, informational systems can be real-enough time. They provide information about the transactions. Updating informational data periodically meets most consumer needs. Consumers need the ability to flexibly query data in different ways capturing the transaction is not important here. High availability of information is important but not critical to daily operations. From a systems design perspective it follows that operational systems and informational systems have entirely different systems designs. The data itself, degree of normalization, data availability, and indexing requirements are completely different. Business Rules. Only a few people in the organization may truly understand the business rules which govern how to properly construct queries or how to massage data so as to properly reflect how the business actually views the data. A key aspect of datamart design is addressing the rules and assuring that the rules are consistently implemented as data is placed into the mart. By addressing the rules upfront, consumers stand a much better chance of retrieving the right answer the first time. Consistent Business Views. The requirement for users to have consistent business views of the data is critical. Consistent business views help to minimize risk. In the absence of a consistent business views, users often create queries that generate incorrect results. Having consistent business views minimizes risk and creates an environment where data may be made available to a broader consumer base. Production Windows. When considering production windows, we must review both peak and off-peak periods. During peak periods, querying against an operational system negatively impacts system performance when optimal performance is needed most. During off-peak periods, it is common for longer queries to collide with nightly backup processes. In both cases, the move to a datamart lessens the impact of either of these conditions. Information Availability. We briefly touched on the concept of real-enough time data as it related to information availability. By this we mean that information has periodicity (i.e., a frequency with which it is needed). Informational data is most often reported daily, weekly, monthly, etc it is not needed up to the minute. While the operational system provides real-time data, few are the users, other than customer service types, who require it for business reporting.1 These are major technology drivers which lead organizations to build datamarts. Some additional reasons to consider include: (1) information is expressed in business terminology, (2) formatting is in business terms, (3) sort orders reflect those most often needed.

Building Datamarts Tool Choices


When considering tools to build datamarts, organizations have a wide range of choices. Two classes of tools exist: Extract, Transform, and Load (ETL) tools and Data Cleansing tools. This document focuses primarily on the former and to a lesser degree on the latter. When we speak of ETL tools we refer to the traditional technology terms input, process, and output. Regardless of the marketing literature, these tools all move data from source to target. While the price tag

Some die-hard managers will not let the idea of real-time data out of their sights. When this is so, a process walkthrough often exposes key process deficiencies that are usually sufficient to make the idea of real-time go away. For example, how real-time is an order when it sits on an employees desk awaiting input?

Data Warehousing, ETL Tools, and Cognos Decision Stream www.firstquarter.net

page 4

for these tools is high, the benefit/cost ratio is more compelling. The ROI on these tools is significant because they: save enormous amounts of database programming, provide consistency from one employee to another, transfer the the resource reliance from expensive DBA resources to technical non-programmer2 resources, streamline key mart building processes, and simplify complex logic. While a build-it approach may at first seem organized the solution becomes more difficult to support as datamart complexity increases. Furthermore, the build-it approach requires expensive DBA time and is not typically accomplished with less expensive resources (i.e., technical non-programmers). It is commonplace for less expensive technical non-programmers to function well with ETL tools. The alternative is to have a expensive DBA build a series of tools and programs to accomplish the same function.

2Do

not underestimate the impact of having technical non-programmers do the bulk of the work with some DBA advisement.

Data Warehousing, ETL Tools, and Cognos Decision Stream www.firstquarter.net

page 5

Extract, Transform, and Load (ETL) Tools


This section describes the issues address by most ETL tools. While feature implementation varies widely between tools, these are the most commonly addressed features in the major products. Database Platform Support. Most ETL tools provide native platform support for the major database platforms. The majors: IBM, Oracle, Sybase, and Microsoft are nearly always supported. Support options vary for Informix, Ingres, and other tier two players. ODBC Connectivity. Most ETL tools have targeted support for bulk copy programs (BCP) and various ODBC data formats (e.g., csv, fixed format, etc). Data Integration. The tools also provide support for heterogeneous application integration. ETL tools typically support various data formats, including ASCII, EBCDIC, and XML. There is also growing trend to integrate with various third-party applications and ERP back office systems. Data Extraction. Moving data from multiple source environments into selected targets is a key feature. Business rules are added as data is extracted. Data Transformation. A significant portion of the datamart building process is achieved during data transformation. At this step conditional and mathematical transformations are achieved. It is quite common to incorporate business related data that is not included in the operational system during transformation. Business rules are added here as well. In this phase logical rules, transformations, substitutions, duplicate handling, and aggregations are performed. Metadata Management. Metadata management is a key issue that the better tools address to varying degrees. Metadata sharing with third party applications and bridges to business intelligent and OLAP environments are key features to watch for. Scalability. Because the volumes of data being moved range from small databases to VLDB systems, ETL tools typically support load balancing, failover/fault tolerance, and allow parallel processing. These features are also supported to varying degrees. Development and Administration. The tools all have a graphical, centralized administrative environments. To various degrees, they have load monitoring, auditing, and job scheduling features.

Data Cleansing Tools


Cleansing is the process of correcting irregularities in the data and consolidate overlapping data into a single composite record. Cleansing requires that data be broken down into its raw elements and be rebuilt into the corrected version. Typically noise words3 are eliminated and abbreviations are replaced. When discussing data cleansing, we are referring to those tools that use heuristic, fuzzy-logic, and probability-match style algorithms to find ways to match and eliminate duplicates. These tools typically consider multiple rows at a time or have sophisticated databases of typical errors (e.g., a simplified example of this would be Microsoft Words autocorrect feature). Data cleansing tools typically perform the following types of functions.
3

Noise words are typically words like: and, but, for, or, nor, the, but, etc.

Data Warehousing, ETL Tools, and Cognos Decision Stream www.firstquarter.net

page 6

Elementizing. Breaks a data record into elements (e.g., the address 1 Pennsylvania Avenue is broken down into Street Number: 1, Street Name: Pennsylvania Avenue, Street Type: Avenue). Standardizing. Standardizes fields to contain identical values for equivalent data elements (e.g., Street or ST, Lane or LN, Circle or CIR, etc.). Verifying. Verifies that data is correct according to specific rules (e.g., the town of Stillwater, Oklahoma has the zip code 74074). If the record with Stillwater, Oklahoma had a zip code of 19460, it would be flagged for follow-up. Matching. Matches data in the record with data in other records (e.g., find all occurrences of ABC Company, Phoenixville, PA and make sure that the zip code is 19460, or find all occurrences of records for John Smith and make sure that the phone number is consistent across all of John Smiths records). Supplementing. Adding rows or attributes as is often required with demographic data.

When is a Tool Required?


The use of ETL tools is typically driven by process complexity, project size, and return on investment. In even the smallest of projects, the benefits obtained outweigh the initial investment. The comments below relate to projects both large and small that should be considered when making a decision about a tool. Process Complexity. We commonly find that organizations have frequently changing business structures, embedded rules in reports, and a significant amount of custom coding in stored procedures and other utilities that are must be tied together into job streams and scheduling systems. As mentioned previously, managing the process complexity in even the smallest of projects becomes cumbersome rapidly. In the longer view, most small implementations grow rapidly to intermediate deployments once that users become aware of the data that is available. Therefore, complexity increases with time. Add to the complexity the fact that organizations change, people advance, etc and a tool is the obvious choice. Project Size. Our experience in observing our client base indicates that projects start small but typically grow rapidly as the consumer base benefits from information use. In typical situations, organizations might start with a single data mart with several tables in the first six months, grow to several datamarts during the next year, and add a significant number of custom views and specialty tables in the 2nd year. What was initially a simple implementation has grown substantially and requires significant management and process to maintain. Return on Investment. The return on investment for ETL tools is driven primary by two factors: (1) economic productivity and (2) process improvement implications. By economic productivity we mean that ETL tools have a lower operational cost structure because technical non-programmers (e.g., a business analyst who knows SQL) can accomplish much of the work with an appropriate level of DBA advisement. There is no side-stepping the need for DBA advisement especially during the design phase. Regarding process improvement implications, operational impacts are recognized almost immediately with ETL tools because duplicate work is eliminated, change control becomes real, following field mappings from sources to targets is easy, adding new output deliveries is reduced from paging through stored procedures to a few mouse clicks, and organizing the order of operations is possible with job scheduling
Data Warehousing, ETL Tools, and Cognos Decision Stream www.firstquarter.net

page 7

capabilities. The integration with many leading metadata repositories adds to the list of process improvement benefits gained through ETL tools.

Cognos Data Manager as a Tool Choice


Cognos Data Manager as a tool is designed with Business Intelligence in mind. Data Manager tightly integrates datamart processes with business intelligence processes. Many of the Data Managers features are discussed previously in the section later in this document. Rather than provide an additional list of features for Data Manager, we focus below on issues specific to managing larger Cognos implementations where Data Manager adds value to a Cognos implementation. Cognos Implementation Level. We discussed previously that datamart complexity grows with time. Creating and managing the datamart represents only half of the job. The remaining half involves the creation and management of catalogs, query definitions, multi-dimensional cube models, cube builds, ties with schedulers all necessary aspects of a typical Cognos implementation. We find that clients move through three levels of implementation quite rapidly during their first few years with Cognos. Implementation Level Level I Typical parameters 1 catalog 1-4 cubes 1-5 query definitions 2 or more catalogs 5-9 cubes 6-19 query definitions 2 or more catalogs 10 or more cubes 20+ query definitions

Level II

Level III

Level I clients will begin realizing the maintenance related issues as their implementation grows and as consumer demands grow. Level II clients are the most prime to realize how Data Manager benefits because they are familiar with the Cognos infrastructure and maintenance related issues. They will quickly appreciate the operational benefits that Data Manager provides. When a client reaches Level III, the complexity level is much greater and so is the benefit that Data Manager provides. Data Manager provides the Cognos client with an edge because it builds catalogs, Impromptu query definitions, baseline cube models as a natural by product. Manually building each of these deliveries is quite time consuming. The tools impact is a lower administrative burden for both the datamart and Cognos portions of a companys decision support implementation. Shared Dimensional Framework. The most commonly shared dimensions across all functions (i.e., finance, sales, purchasing, human resources, etc) are product, location, time, and organizational structure. These dimensions and others are used in all functional datamarts. Cognos Data Manager provides a reuse capability for managing the dimensions and sharing them with marts as appropriate. Because Data Manager uses a shared dimension concept, leveraging dimensions between marts while maintaining each dimension
Data Warehousing, ETL Tools, and Cognos Decision Stream www.firstquarter.net

page 8

only once is straightforward. The leveraging effect facilitates expansion beyond departmental applications by encouraging reuse of common dimensions throughout the organization. Merging and Aggregations. As we discuss merging and aggregations, we must remember that Data Manager was built with business intelligence in mind. By this we mean that it uses the dimensionality both to merge data and also when generating various summary levels of output. Its merge and aggregation engine is designed with business intelligent applications in mind. The process of creating new summary level outputs is tied to the dimensionality within a Data Manager build. A key operational benefit to Data Managers design is the single pass over the source data. When building marts with millions of source records, the impact of a single-pass product on operational windows is a significant benefit. A single pass engine. Level II and III implementations typically produce multiple cubes that rely upon a shared query base for building cube targets. Frequently queries are executed multiple times in order to build all cubes. Data Manager can significantly reduce the number of queries required to build the entire cube base because of its single pass engine. The obvious impact is shorter cube build times, avoiding the collision with backups, and providing an ability to load most data during off-peak hours. Business Intelligence Change Control. Data Manager provides capability to manage a complete Cognos implementation within its framework. This includes change control of Impromptu catalogs and cube models. While a single field addition does lead to a set of cascading changes in the Cognos world, Data Managers change control method simplifies this process greatly it ultimately protects production processes from minor changes.

Summary
Although companies have completed large, multi-million dollar integrated system projects, the need for integrated information persists to this day. In many cases providing consumers with timely answers to critical business questions requires a two pronged solution. First, that organizations build datamarts to house and integrate key business data. Second, that reporting systems be built in order to provide consumers with access to timely information. This would also include tools for reporting, analytics, and adhoc query. ETL tools are the key facilitator used to build datamarts. They address numerous process and human issues which almost immediately change the build-it? or buy-it? question to buy-it decision.4 ETL tools have a common set of features which vary in degree from product to product. While Data Manager has many standard ETL features, it also addresses issues that are unique to Cognos shops. To the business person who has little interest in ETL, datamarts are key to providing timely information and building reporting systems which are ultimately used to manage performance. To the technologist, ETL tools provide a distinct advantage from a process management, human resource management, and economic perspective. While ETL tools are not silver bullets, the value proposition they provide is compelling, especially in organizations with growing decision support needs and changing workforce populations.

Note: there are conditions where business logic is simply so complex that the only workable solution is custom programming (e.g., COBOL).

Data Warehousing, ETL Tools, and Cognos Decision Stream www.firstquarter.net

page 9

Data Warehousing, ETL Tools, and Cognos Decision Stream www.firstquarter.net

page 10

You might also like