0% found this document useful (0 votes)

36 views46 pages

Optimize Your Business Data Warehouse

Uploaded by

amni10ab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views46 pages

Optimize Your Business Data Warehouse

Uploaded by

amni10ab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 46

Harness Business Data

Build a fully-optimized business data warehouse in five courses.

https://www.coursera.org/specializations/data-warehousing

À propos de cette Spécialisation

Evaluate business needs; design a data warehouse; and integrate, and visualize
data using dashboards and visual analytics.

This Specialization covers data architecture skills that are increasingly critical across a broad
range of technology fields. You’ll learn the basics of structured data modeling, gain practical
SQL coding experience, and develop an in-depth understanding of data warehouse design and
data manipulation. You’ll have the opportunity to work with large data sets in a data
warehouse environment to create dashboards and Visual Analytics. You will use of
MicroStrategy, a leading BI tool, OLAP (online analytical processing) and Visual Insights
capabilities to create dashboards and Visual Analytics. In the final Capstone Project, you’ll
apply your skills to build a small, basic data warehouse, populate it with data, and create
dashboards and other visualizations to analyze and communicate the data to a broad audience.

University of Colorado

COURS 2 : Data Warehouse Concepts,

Design, and Data Integration
https://www.coursera.org/learn/dwdesign
WEEK 1

Data Warehouse Concepts and Architectures

f) Data warehouse architectures and maturity
* architecture = refers to the organization of components to support specified goals.
For data warehouses, business goals drive technology choices. So, architecture is primarily a business
issue, not a technology issue.

* The scope and integration level are important factors in determining an appropriate
architecture.
- Scope refers to the breath of an organization supported by a data warehouse. Scope can be
measured in various ways such as the number of data sources used the number of source systems
providing inputs, or number of organizational units providing inputs or using a data warehouse.
- Integration level refers to several data quality indicators across data sources. Integration level
involves completeness, consistency, conformity, and duplication across data sources not within
individual data sources.
* Two opposing approaches provide limits on architectures for data warehouses.
- The top down approach, known as the enterprise data warehouse, provides a single data
warehouse for an organization. Through logical centralization, the top down approach is a high level
integration in large scope.

--> top-down architecture with three tiers, staging area, data warehouse and data marts.
- The staging area provides temporary storage of transformed data before loading into
the data warehouse. The staging area is particularly useful for data warehouses with a
large number of operational databases and external data sources requiring complex
data transformations.
- enterprise data model known as an EDM. The EDM defines a structure of the data
warehouse in details required to access operational databases and external data
sources. Departmental users generally need access to small portions of the data
warehouse instead of the entire data warehouse.
- To provide them with faster access, while isolating them from data needed by other
user groups, smaller data warehouses called data marts are often used. Data marts act
as the interface between business analysts and the enterprise data warehouse, storing a
derived part of the data warehouse, and refreshing this derived part on a periodic basis.
- The bottom up approach provides independent data marts with lower integration levels. A data
mart is a data warehouse with a small scope justified by tangible benefits from critical business
reporting needs and lower development risks from smaller project sizes.
- In a bottom-up architecture, important business reporting requirements
spanning operational databases drive the development of independent data marts. Each
data mart is relatively small in scope supporting several organizational units and data
sources. The integration is limited because of the small scope of data marts. The set of
data marts may evolve into a large data warehouse if the organization can justify the
expense of building an enterprised data model.
One approach, know as a data mart bus approach, provides an additional layer or data
integration, to transform selected data from independent data marts. The level of data
integration in the data mart bus approach is much less than the level in an enterprise
data warehouse.

* The data warehouse maturity model provides a framework to view an organizations progress and
to guide investment decisions.
h) Employment opportunities
i) Course software requirements
For this course, you need to install a database server (Oracle or MySQL), Pivot4J plugin for the
Pentaho Business Analytics Platform, and Pentaho Data Integration. For both Pentaho software, you
should install the community editions. You will use Pivot4J in module 2 and Pentaho Data Integration
in module 5. The database server is used in module 5 as part of using Pentaho Data Integration. You
should also install a database client to create and populate the tables used in module 5.

WEEK 2

Multidimensional Data Representation and Manipulation

Data cube representation
* A data cube provides a multidimensional range of factors as dimensions in quantitative variables in
the cells of a data cube.
* A dimension is a subject label for a row or column. For example, a dimension may be city size or
type of health plan offered.
* A data cube is multi-dimensional. It is not limited to two or three dimensions.
Data cube operators
Using the slice operator, a business analyst can focus on the subset of dimensions, replacing the
dimension with a single value.
For example, this slice operation replaces the time dimension with the single date January 1,
2013. This slice operation only shows the front face of the data cube, with the first value of depth or
z-axis, January 1, 2013. A variation of the slice operator allows the decision maker to summarize
across members, rather than focus on just one member.

The slice summarize operator replaces one or more dimensions with summary calculations. The
summary calculation often indicates a total value across members or the central tendency of the
dimension, such as the average or median value.
The dice operator replaces a dimension with a subset of values of the dimension.
This example shows the result of the dice operation to display sales for the USA states of Utah,
Arizona, and Colorado for January 1, 2013.
A dice operation typically follows a slice operation and returns a subset of the cells displayed in the
preceding slice.

* The drill-down operator allows analysts to navigate from a broader level to a narrow or more
detailed level, such as navigating from product group to individual products.
Roll-up is the opposite of drill-down. Roll-up involves moving from a more detailed level to a broader
level of a hierarchical dimension.

* The pivot operator supports rearrangement of the dimensions in a data cube.

For example, the position of the product in the location dimensions an be reversed in the sales data
cube so that the product appears in the columns and location on the rows.
The pivot operator allows the dimensions to be presented in the most appealing visual order.
Overview of Microsoft MDX
Microsoft Multidimensional Expressions language, MDX
MDX is the foundation for the Microsoft SQL Server Analysis Service and Microsoft Excel Pivot
Tables.
Prominent open source projects using MDX are JPivot, Pivot4J and the Pentaho Business Analytics
platform.

An MDX cube consist of dimensions and

measures.

This cube structure shows two measures,

- quantity and
- sales,

along with five dimensions,

- markets,
- customers,
- product,
- time and
- order status.

In MDX, dimensions are composed of attributes.

For example, the product dimension has
attributes
- line,
- vendor, and
- product.
* Attributes can be independent, or related in a hierarchy.
In this MDX cube example, attributes are hierarchically related.

* MDX has some unique terminology for data cubes.

- A tuple is a combination of members, one from each dimension. A tuple identifies a cell.
- Axis refers to a dimension from a source data cube used in a query. In MDX, cubes typically have
two axes, rows and columns, although more axes can be used.
- A slicer refers to a tuple in the result of an MDX query expression.

Microsoft MDX Statements

Although the MDX select statement appear similar to SQL select statement, there's a fundamental
difference. The result of a SQL select statement is a table. One of the result of the MDX statement is
a data cube.
Overview of Pivot4J
Pivot4J uses a pivot table interface to support data cubes in Microsoft MDX.
This pivot table contains the product label as the row heading with member values shown in
individual rows. The pivot table displays order status as the column heading, with members listed on
each column. The sales measure is also displayed below the order status members on the columns.

Pivot tables have become the standard interface for data cubes.
Overview of WebPivotTable

Pivot4J software demonstration

WEEK 3

Data Warehouse Design Practices and Methodologies

Relational database concepts for multidimensional data
A multidimensional representation of a data warehouse involves dimensions with attributes on the
access of the data cube it measures in cells.
A dimensional model for reasonable size data warehouse typically involves multiple data cubes,
sometimes sharing dimensions and measures.

A table design for data warehouse typically consists of dimension tables, connected to fact tables,
and one-to-many relationships. The attributes of a dimension map to columns in a dimension
table. Fact tables contain measures as well as foreign keys to dimension tables.

A common error early in the design process is to omit dimensions related to a fact table.
The size of a fact table can be estimated using the cardinality of each dimension discounted by the
rate of sparsity. Sparsity increases as the number of dimensions increases and the cardinality of a
dimension increases.

example :

* Fact tables are classified based on the types of measure stored.

- A transaction table contains additive measures.
--> Typical transaction tables store measures about sales, web activity, and purchases.
- A snapshot table provides a periodic view of an asset level.
--> Typical snapshot tables store semi-additive measures about inventory levels, accounts
receivable balances, and accounts payable balances.
- A factless table records event occurrences, such as attendance, room reservations, and hiring.
--> Typically, factless tables contain foreign keys without any measures.
* fact table may be a combination of these types.
- The names in black are dimensions, and
- the names in red are measures.

Table design patterns

The traditional schema pattern for a data warehouse is known as a star schema, consisting of one
fact table surrounded by dimension tables in one-to-many relationships.

The star schema can represent one data cube. This ERD consists of four dimension entity types, Item,
Customer, Store, and TimeDim, along with one transaction fact entity type called Sales.
When converted to a table design, the Sales table has foreign keys to each dimension table, Item,
Customer, Store, and TimeDim.

* For related business processes that share some of the dimension tables, a star schema can be
extended into a constellation schema with multiple fact entity types, as shown in this ERD.
- When converted to a table design, the inventory entity type becomes a fact table and one of many
relationships become foreign keys in the fact table.
- The inventory entity type contains a number of measures, including the quantity on hand of an
item, the cost of an item and a quantity returned.
- All dimension tables are shared among both fact tables, except for the supplier and customer
tables.
The constellation pattern is important because it shows shared dimensions to represent multiple
data cubes.

* A snowflake schema has multiple levels of dimension tables related to one or more fact tables.
In this CRD the store dimension has been split into two entity types. Store and division along with a
one to many relationship from division to store.
You should consider the snowflake schema instead of the star schema for small dimension tables
that are not fully normalized.

The Store table in the previously shown star schema is not fully normalized because division id
determines division name and division manager.
* Time representation is crucial for data warehouses, because most data warehouse queries use
time in conditions.
The principle usage of time is to record to occurrence of facts. The simplest representation is a time
stamp data type for a column and a fact table.
In place of a timestamp column, many data warehouses use a foreign key to a time dimension table,
as shown in previous examples in this lesson.

Summarizability patterns for dimension tables

Summary computations for measures occur in drill down and roll up operations on a data cube, and
joint operations combining fact and dimension tables.
Violations of summarizability conditions, the track from the usability of data warehouses for
summary queries.
The most serious summarizability violations produce erroneous results.

Summarizability patterns involve the relationship among dimension levels and the relationships from
dimension to fact tables.

* In this example, the parent college level drills down to the department child level with a smaller
total. The enrollment in the business college is omitted in the department level because the business
college does not have departments.
* Roll up incompleteness is the reverse of drill down incompleteness.
Rolling up from a child, that is a finer level, to a parent, that is a coarser level, shows a smaller total
indicating that measured values attributed to child members have not been allocated to parent
members.
In this example, the product child level rolls up to the category parent level with a smaller total.

The values of minimum and maximum cardinalities determine schema patterns for the three
dimension summarizability problems.
- The drilll-down incomplete problem involves a minimum cardinality of zero for the parent
entity type as shown in ERDA.
- The roll up incomplete problem involves a child's minimum cardinality of zero as shown in
ERDB.

Summarizability patterns for dimension-fact relationships

Incomplete dimension-fact relationships involve fact entities that do not have a related parent
entity in a dimension-fact relationship.
Inconsistencies caused by incomplete relationships are manifest in joint operations with summary
calculations.
The non strict dimension fact relationship problem involves double counting measure values due to a
many to many relationship between dimension and fact tables.
Mini case for data warehouse design
- Using specifications of data source and business needs, we will first specify dimensions and
measures including important properties.
- Then you will determine the most appropriate grain and make relative size calculations about the
grain.
- You will create a table design for the data warehouse that supports the dimensional model and data
sources.
- You'll then identify summarizability problems, and suggest resolutions.
- Finally, you'll map the data sources and populate data warehouse tables using sample data from the
data sources.
Data warehouse design methodologies
A design methodology is a vital tool in data warehouse development, combining phases, labor and
automation, and project management.

Without an appropriate methodology, the best efforts will likely fail to produce a data warehouse
with high value for an organization.
Artifacts of a data warehouse design are dimensional models, a schema design using patterns
presented in other lessons, data integration procedures and data marks to support business analysis.

* The demand-driven data warehouse design methodology, also know as the requirements-driven
approach, first proposed by Kimball in 1988, is one of the earliest data warehouse design
methodologies.
The demand-driven methodology has three phases for identifying data marts and under the subsets
of user requirements, building a matrix-related data marks and dimensions, and designing fact
tables.

* The supply driven methodology emphasizes the analysis of existing data sources.
Entities in ERDs of existing data sources are analyzed to provide a starting point for the data
warehouse design.
The supply driven methodology has three phases, to Classify Entities, Refine Dimensions and to
Refine the Schema.

* The hybrid data warehouse design methodology proposed by in 2001 combines a demand and
supply methodologies.

The Hybrid Methodology involves a demand-driven stage, a supply-driven stage, and then a third
stage to integrate the demand in supply driven stages.

The demand and supply stages could be done independently as shown in this diagram.

The overall emphasis in the hybrid approach is to balance demand and supply aspects of data
warehouse design, possibly aided by automated tools. The demand driven stage collects
requirements using the goal, question, metrics, or GQM approach.
The GQM approach, provides some informal guidelines to define measures and dimensions from the
goals.

The second step of the hybrid methodology, involves analysis of existing ERDs. The methodology
provides guidelines to identify fact and dimension tables and existing ERDs. Potential fact tables are
identified based on a number of additive attributes. Dimension tables are involved in one of many
relationships with fact tables.

The third step of the hybrid methodology integrates the dimensional model in the demand stage and
the star schema in the supply stage.

WEEK 4
Data Integration Concepts, Processes, and Techniques
Concepts of data integration processes
Data integration adds value to disparate data sources that contribute to enterprise data
warehouses. The primary goal of data integration is to provide a single source of truth for decision-
making.
Refresh processing is about updating
- The Preparation Phase manipulates change data from individual source systems.
- Extraction retrieves data from individual data sources.
- Transportation moves extracted data to a staging area.
- Cleaning involves a variety of tasks to standardize and improve the quality of the extracted
data.
- Auditing records results of the cleaning process, performing completeness
and reasonableness checks and handling exceptions.
- The integration phase merges separate clean sources into one source.
- Merging can involve removal of inconsistencies among the source data.
- Auditing records results of the merging process, performing completeness
and reasonableness checks, and handling exceptions.
- The update phase involves propagating the integrated change data to various parts of the data
warehouse.
After propagation, notification can be sent to user groups and administrators. In addition to
periodic refreshment, data integration involves initial population of a data warehouse.
Change data concepts
emphasizes change data used in data integration processes to populate and refresh your data
warehouse.

* 4 types of change data:

- Source system requirements involve modifications to source systems to acquire change data.
Typical changes to source systems are new columns, such as timestamps required
for queryable change data, and trigger code required for cooperative change data.
- Processing level involves resource consumption and development required for data
integration procedures.
Logs and snapshot change data involve substantial processing.

1) Cooperative change data involve notification from a source system about changes. The notification
typically occurs at transaction time using a trigger.

A trigger is a rule executed by a DBMS when an event occurs, such as inserting a new row.
A trigger involves software development and execution as part of a source system. Cooperative
change data can be input immediately into a data warehouse. Or placed in a queue or staging area
for later processing, possibly with other changes.
2) Logged change data involve files that record changes or other user activity.

For example, a transaction log contains every change made by transaction, and a web log contains
page access histories called click streams by web site visitors.

3) queryable change data come directly from a data source via a query.

Queryable change data require time stamping in a data source. Since few data sources contain time
stamps for all data, queryable change data usually are augmented with other kinds of change data.

Queryable change data are most applicable for fact tables using columns such as order date,
shipment date and hire date, that are stored at operational data sources.

4) Snapshot change data involve periodic dumps of source data. To derive change data, a difference
operation uses the two most recent snapshots. The result of a difference operation is called a delta.

Generating a delta involves comparing source files to identify new rows, changed rows, and deleted
rows
Data cleaning tasks
Parsing decomposes complex objects, using text, into their constituent parts.
For data integration, parsing is important for decomposing multi-purpose text data into individual
fields.
Pattern matching with regular expressions
Regular expressions specify patterns for parsing text fields with multiple components, common in
data integration tasks.

Regular expression tools are widely supported in data integration tools, DBMSs, application
programming interfaces in testing web sites.

(..........
..................
..................)

Matching and consolidation

* application for entity matching.

Entity matching identifies duplicate records in two or more data sources when no common reliable
identifier exists.
The classic application involves identification of duplicate customers and data sources from different
firms.
Because a common identifier does not exist, duplicates must be identified from other common fields
such as names, address components, phone numbers and ages.
Because these common fields come from different data sources, inconsistency and non standard
representations may exist, complicating the matching process.

Matched entities can be merged or linked. If merging two entities, sometimes old data from one
source is discarded. In addition, new fields can be added to obtain data unique from each data
source. Linking maintains separate entities, but notes relationships.

Quasi identifiers and distance functions for entity matching

entity matching algorithms use quasi identifiers to compensate for missing common
identifiers. Quasi identifiers can be almost unique in combination. --> In a study published in 2000,
Sweeney demonstrated that 87% of the US population can be identified by a combination of gender,
birth date, and postal code.

Entity matching approaches use distance functions to determine if quasi identifiers in two entities
indicate the same entity.
In a geometric sense, distance is the amount of space between two points.
For entity matching a point is a combination of values, one from each quasi identifier. Nurmeri-quasi
identifiers are easy to compare, but text quasi identifiers can be difficult to compare.
WEEK 5
Architectures, Features, and Details of Data Integration Tools
Architectures and marketplace
To support the complexity of data integration processes to populate and refresh a data warehouse,
software products for data integration have been developed.

* Data integration tools support two architectures.

a) The extraction, transformation, and loading architecture, abbreviated ETL, performs
transformation before loading, as shown in this diagram.
- Extraction, data cleaning, and other integration tasks are performed by a transformation engine
before loading into the target data warehouse tables.
- The transformation engine is independent of the DBMS for data warehouse tables.

b) The extraction, loading, and transformation architecture, abbreviated at ELT, uses a relational
DBMS to perform transformations after extraction loading.
- The ELT Architecture is designed to utilize high performance features of the enterprise DBMSes.
- ETL architecture supporters emphasize DBMS independence of ETL engines, While ELT architecture
supporters emphasize superior optimization technology in relational DBMS engines.
- ETL architectures can usually support more complex operations in a single transformation than ELT
architectures, but ELT architecture may use less network bandwidth.
- High execution indicates large firm size and resources to support current and future products.
- High vision indicates a firm with broad integrated product offerings.

Common features of data integration tools

This diagram divides features of data integration tools into essential and secondary categories.
- Integrated development environments, IDEs, support complex software projects with a source
code editor, visual specification tools, debugger, and code generator packaged in a convenient
graphical interface

- Workflow and component specifications are the most salient parts of an IDE for data integration.
- A repository is a design database for all features of a data integration tool.
The repository stores design objects and relationships, maintains dependencies among design
objects, and provides documentation, with predefined reports and convenient formatting.

- Job management supports scheduling and monitoring of jobs for executing data integration
workflows.
Scheduling typically handles base schedules, repetition, conditional execution, and stepping through
a workflow to identify problems.
Monitoring provides logging of events, performance alerts, and reports.

- Data profiling helps a data warehouse administrator and data owners understand and approve
data quality and data sources. A data profiling tool can reduce unexpected delays and surprises in
populating and refreshing a data warehouse.

Talend Open Studio

Talend Open Studio, a prominent open source data integration product.

Pentaho Data Integration

Pentaho provides a unified platform for data integration, business analytics, and big data. Like
Talend, Pentaho uses the open core model, with an open source community edition and proprietary
extensions and commercial additions. Pentaho offers commercial products for data integration,
business analytics, and big data analytics.

Course conclusion
COURS 3 : Relational Database Support for
Data Warehouses
https://www.coursera.org/learn/dwrelational
WEEK 1

DBMS Extensions and Example Data Warehouses

DBMS extensions

* The failure of operational databases to support higher level decision making is a combination of
inadequacy of database technology and limitations in employment of databases.

- Performance limitation
Organizations and DBS vendors discovered a single database could not be configured to provide
adequate performance for both transaction processing and business intelligence processing.
- Lack of integration
Organizations discovered that the lack of integration among operational databases hindered higher
level decision making.
This lack of integration was thought a designed failure, so operational databases primarily support
transaction processing, not business intelligence processing.
- Missing data management feature
Organizations experience unexpected development difficulties and poor performance for business
intelligence applications.
These difficulties were unexpected. Because the promise of non procedural specifications, the SQL,
and optimizing compilers.

*
Relational database schema patterns
* The traditional schema pattern for data warehouse is known as a star schema,
--> consisting of one fact entity type per table, surrounded by the dimension entity types or tables in
one-to-many relationships. The star schema represents a single data cube.

This ERD consists of four dimension entity types. Item, Customer, Store, and TimeDim, along with
one transaction fact entity type called, Sales. When converted to a table design, the Sales table's
foreign keys to each dimension table, Item, Customer, Store, and TimeDim.

In some designs, the fact entity type depends on the related dimension entity type's first primary
key. Since fact tables can have many relationships, it is generally preferred to have an artificial
identifier rather than a large combined primary key.
In this example, SalesNo is an artificial identifier for the sales entity type.
* For related business processes that share some of the dimensions, a star schema can be extended
into a constellation schema with multiple fact entity types.

This constellation ERD diagram contains two fact entity types, Sales and Inventory, along with five
dimension and two types, shared among the fact entity types.

The constellation pattern is important, because it shows shared dimensions to represent multiple
data cubes. A data warehouse for medium size business may have many data cubes.
For a complex constellation diagram, a matrix provides a convenient mapping of dimensions and
facts. This matrix shows a mapping of the simplified schema diagram shown previously.

Item and TimeDim are shared in all three schemas.

Most dimensions appear in two schemas. Shared dimensions are conformed or standardized across
schemas.
Supplier and Store are used in only one schema each.

* You should consider the snowflake schema instead of the star schema for small dimension tables
that are not fully normalized.
- The store table in the previously shown star schema is not fully normalized, because Division ID, the
term is Division Name and Division Manager. Since the store table is relatively small, query
performance will not suffer much with the need to join the division table in a snowflake design.
- For large dimension tables, such as customer, however, query performance may suffer with extra
join operations required in a snowflake design.
* The inventory data warehouse supports business intelligence about inventory cycles. Inventory
bought, sold, consumed, and produced, is the heart of any manufacturing or distribution company.
- Inventory transactions are frequent and common.
- The volume and significance of inventory transactions makes them important in a data warehouse
design.
- The work order, sales, and purchase life cycles, affect the perpetual inventory balance as shown in
this diagram.
Before starting on assignments in Module's 2, 3, and 4, you need to create and populate the
inventory data warehouse tables. The course website contains files with a create table and search
statements for each table

Course software requirements

For this course, you need to install the Oracle database server. You should also install the SQL
developer client to create and populate the tables used in this course.
- Overview of software requirements

- Overview of database software installation

- Oracle installation notes

- Making connections to a local Oracle database

- SQL statements for Store Sales tables

https://d3c33hcgiwev3.cloudfront.net/
_4b4ec83a9aa2fb6b0d7e85677d82dcac_SalesDWStatements.txt?
Expires=1464739200&Signature=g9QqP02gb-RDhz5AfRwhIki8RcEGxCJje-
4yR4nJebN34eN2Txnn87bzCYdkrrIFMLEPIzrznkoQNL-
RbWyEQ7vGMu75j5zj7F4dqRoUjpQYN9hTlxi7Btn29sj290brQPv9QrirBU0MPxUluqY9ypxxJjQLqAUJyO
5xp8a6-08_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A

- SQL statements for Inventory tables

The Inventory data warehouse background

InventoryDW.doc
The CREATE and INSERT statements for the Inventory data warehouse tables.

InventoryStarCREATEStatementOracle.txt

https://d3c33hcgiwev3.cloudfront.net/
_54e3cd972f071cc3211121f55ae1dfdf_InventoryStarCREATEStatementOracle.txt?
Expires=1464739200&Signature=UB4CCfmOi-tb~ujJNPeZSQqsplEUxJHvM-mY-TmisdMygg-
FUj1YKkE2c1ZJPv143kUrDl0TxZQR7EHlUfLrTccCBg1cyt47gB-
c4Xh~vuf5t6gRpKSCTthagxzMzm1Nn6e5i0F6oqfI~rJrw1yq2lrcVqzNedkVEaVy5V5rw-I_&Key-
Pair-Id=APKAJLTNE6QMUY6HBC5A

InventoryStarINSERTStatementOracle.txt

https://d3c33hcgiwev3.cloudfront.net/
_1e3a5ba95f42164c9e93d521fab08569_InventoryStarINSERTStatementOracle.txt?
Expires=1464739200&Signature=YxM8hwZBgE~4Kfy3llaGU81B9L-dF79-
6JTLhBlAobj8vvOSg3X2uUceihMrz1a6~KWfdknnOXS9VorP9GFaY6xt3xcnWKpj4JeVuFgS4E5XlO
rMa-LceSiCm0sOOUup4IFdz~E9zgSh3jm5aF6FLquPPNqJa1bszMN0Hm1W8Vk_&Key-Pair-
Id=APKAJLTNE6QMUY6HBC5A

WEEK 2

SQL Subtotal Operators

GROUP BY clause review
The group by clause produces row summaries in which each row summary replaces collections or
groups of rows.
Each row summary contains grouping columns and summary values calculated with aggregate
functions.

An aggregate function calculates one value per set of rows.

The standard aggregate functions in SQL are min, max, count, sum and avg for average. Many other
aggregate functions are typically provided depending on the dv mess.
* example from previous data model
SQL CUBE operator

The CUBE operator generates a complete set of subtotals for collection of columns.
- Is only appropriate for independent columns, usually less than four columns.
- The column specification is order independent due to the nature of the operator generated all
possible subtotals. You can understand results of a cube operation by comparison to the normal
group by results.

- The first seven rows are identical in both results.

- Dash in the cube results indicates subtotal rows. For example the eighth row, the CA for
California and the dash, shows the sum of sales, 175, for California across all months.
- This cube result group beyond two columns stated month contains three sets of subtotal
rows.
Subtotals for state,
subtotals for month,
and the grand total.

* example from previous data model

SQL ROLLUP operator
The ROLLUP operator, generates a partial set of subtotals for a collection of columns. It is
appropriate for hierarchically related columns, that can be part of a hierarchical dimension.

You can understand the results of a ROLLUP operation, they comparison to the normal group by
results.
- They only syntax difference in the abbreviated statements, is the ROLLUP operator in a group
by clause. Complete select statements, appear later in the lesson.
- The ROLLUP result contains additional rows for sub totals.
- The first 6 rows are identical in both results. Dashes in the ROLLUP results, indicate sub total
rows.
- For example, the seventh row shows the sum of sales as 325 in 2012, across all months.
- The last row with two dashes, shows the grand total of 675.
- This ROLLUP result, grouping on 2 columns, year and month, contains 2 sets of subtotal rows,
subtotals for year and the grand total.
SQL GROUPING SETS operator
The grouping sets operator provides complete flexibility to generate any set of subtotals, even the
normal group by results are not generated by default.
- To achieve this flexibility you must provide explicit specification of subtotal groups, more
tedious than the specification for the cube and roll up operators.
- The explicit specification for the grouping sets operator has some similarity with writing a
union query to generate subtotals. For both the union query and grouping sets operator, you
must know the collections of subtotals to generate.

This example shows a select statement using a grouping sets operator.

- The grouping sets operator explicitly specifies the complete set of sub totals.

- The new syntax in this statement is the grouping sets key words with a list of subtotal groups
after the group by clause. The grouping sets operation contains four subtotal groups. The
combination of store, zips, and time, month, store/zip by itself, time/month by itself and the
grand total specify it as empty parenthesis.

Variations of subtotal operators

.......... (pas envie)

WEEK 3

SQL Analytic Functions

Processing Model and Basic Syntax

Extended Syntax and Ranking Functions

Window Comparisons I

Window Comparisons II

Functions for Ratio Comparisons

Quiz and Assignment

WEEK 4

Materialized View Processing and Design

Background on Traditional Views

Materialized view definition and processing

Query Rewriting Rules

Query Rewriting Examples

Oracle Tools for Data Integration

Quiz and Assignment

WEEK 5

Physical Design and Governance

Storage Architectures

Scalable Parallel Processing Approaches

Big data issues

Data Governance

Quiz and Assignment

Closing
COURS 4 : Business Intelligence Concepts,
Tools, and Applications
https://www.coursera.org/learn/business-intelligence-tools
WEEK 1 :

Big Data: Why and Where

a) ...

WEEK 1

Decision Making and Decision Support Systems

Course Overview
Lesson 1.1: Overview of Decision Making

Lesson 1.2: Conceptual Foundations of Decision Making

Lesson 1.3: Decision Support Systems

Lesson 1. 4: DSS in Practice

Module 1: Software Requirements, Practice Quiz, and Assignment

WEEK 2

Business Intelligence Concepts and Platform Capabilities

Lesson 2.1: BI Concepts

Lesson 2.2 : BI Platform Capabilities

Lesson 2. 3: Business Reporting

Lesson 2. 4: BI OLAP Styles

Module 2: Practice Quiz, Peer Review Assignment, and Graded Quiz

Graded Quiz #1: Please complete this GRADED quiz, which covers content from Modules 1
and 2

WEEK 3

Data Visualization and Dashboard Design

Lesson 3.1: Data Visualization
Lesson 3. 2: Data Visualization Guidelines and Pitfalls

Lesson 3. 3: Performance Dashboards

Lesson 3. 4: Dashboard Design Guidelines and Pitfalls

Module 3: Practice Quiz and Peer Review Assignment

WEEK 4

Business Performance Management Systems

Lesson 4. 1: Business Performance Management

Lesson 4.2 : Performance Measurement System

Lesson 4. 3 : Balanced Scorecards and Six Sigma

Lesson 4. 4: Business Analytics

Module 4: Practice Quiz and Assignment

WEEK 5

BI Maturity, Strategy, and Summative Project

Lesson 5.1: BI Maturity

Lesson 5.2: BI Strategy

Graded Quiz #2: Please complete this GRADED quiz, which covers content from Modules 3,
4 and 5.

Closing Video

Bca DM Unit Ii
No ratings yet
Bca DM Unit Ii
17 pages
Data Warehouse Modeling Guide
No ratings yet
Data Warehouse Modeling Guide
17 pages
Unit - II Data Warehouseing&OLAP
No ratings yet
Unit - II Data Warehouseing&OLAP
17 pages
Unit2 Olap
No ratings yet
Unit2 Olap
13 pages
Data Warehousing and BA
No ratings yet
Data Warehousing and BA
77 pages
Unit 2 - Data Science BCA
No ratings yet
Unit 2 - Data Science BCA
20 pages
Advanced Databases: Data Warehousing & OLAP
No ratings yet
Advanced Databases: Data Warehousing & OLAP
34 pages
Data Warehousing Essentials Guide
No ratings yet
Data Warehousing Essentials Guide
20 pages
Data Warehousing and Multidimensional Models
No ratings yet
Data Warehousing and Multidimensional Models
71 pages
ERP: OLTP, OLAP, and Data Warehousing
No ratings yet
ERP: OLTP, OLAP, and Data Warehousing
5 pages
Fundamentals of Data Science Notes (Module - 2)
No ratings yet
Fundamentals of Data Science Notes (Module - 2)
11 pages
Multidimensional Data Model Overview
No ratings yet
Multidimensional Data Model Overview
31 pages
Data Warehouse & OLAP Essentials
No ratings yet
Data Warehouse & OLAP Essentials
17 pages
Data Warehousing Concepts and Design
No ratings yet
Data Warehousing Concepts and Design
26 pages
Data Warehousing Overview and Concepts
No ratings yet
Data Warehousing Overview and Concepts
60 pages
Module 1
No ratings yet
Module 1
71 pages
Understanding Data Warehousing Concepts
No ratings yet
Understanding Data Warehousing Concepts
191 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Data Warehouse Basic Concepts Guide
No ratings yet
Data Warehouse Basic Concepts Guide
22 pages
Unit1 Dwbi
No ratings yet
Unit1 Dwbi
59 pages
DW Olap
No ratings yet
DW Olap
57 pages
Idq New Log Files
No ratings yet
Idq New Log Files
187 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
MCS221 Data Warehousing
No ratings yet
MCS221 Data Warehousing
102 pages
Data Warehouse Essentials
No ratings yet
Data Warehouse Essentials
6 pages
Data Warehousing and OLAP Overview
No ratings yet
Data Warehousing and OLAP Overview
6 pages
Assignment On Chapter 3 Data Warehousing and Management
No ratings yet
Assignment On Chapter 3 Data Warehousing and Management
17 pages
Data Warehousing for Analysts
No ratings yet
Data Warehousing for Analysts
56 pages
Designing The Data Warehouse - Part 1
100% (2)
Designing The Data Warehouse - Part 1
45 pages
Mapping Data Warehouse to Multiprocessor
No ratings yet
Mapping Data Warehouse to Multiprocessor
34 pages
Understanding Data Warehousing Basics
No ratings yet
Understanding Data Warehousing Basics
76 pages
Understanding Data Warehousing Concepts
No ratings yet
Understanding Data Warehousing Concepts
10 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
47 pages
Unit3 Notes
No ratings yet
Unit3 Notes
15 pages
Data Warehousing for Business Decisions
No ratings yet
Data Warehousing for Business Decisions
33 pages
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
Data Warehousing: Key Benefits & Concepts
No ratings yet
Data Warehousing: Key Benefits & Concepts
7 pages
Unit 2 DATA WAREHOUSE AND DATA MART
No ratings yet
Unit 2 DATA WAREHOUSE AND DATA MART
17 pages
Data Warehouse Interview Questions Guide
No ratings yet
Data Warehouse Interview Questions Guide
19 pages
ML Module1
No ratings yet
ML Module1
56 pages
Basic Elements of Data Warehouse Architecture
100% (1)
Basic Elements of Data Warehouse Architecture
3 pages
DWDM Unit - I Notes
No ratings yet
DWDM Unit - I Notes
24 pages
Lesson 3. Data Warehouse Modelling
No ratings yet
Lesson 3. Data Warehouse Modelling
16 pages
Unit Ii
No ratings yet
Unit Ii
59 pages
Data Warehousing and Data Marts Overview
No ratings yet
Data Warehousing and Data Marts Overview
3 pages
Lect 14 DM
No ratings yet
Lect 14 DM
33 pages
DWM PDF
No ratings yet
DWM PDF
35 pages
Data Warehousing for Analysts
No ratings yet
Data Warehousing for Analysts
26 pages
Data Federation vs. Data Warehouse Explained
No ratings yet
Data Federation vs. Data Warehouse Explained
7 pages
Pharma Batch: Data Warehousing
No ratings yet
Pharma Batch: Data Warehousing
32 pages
Overview of Data Warehouse Concepts
No ratings yet
Overview of Data Warehouse Concepts
14 pages
Pentaho Data Integration Installation
No ratings yet
Pentaho Data Integration Installation
4 pages
Oracle Database Appliance Installation
No ratings yet
Oracle Database Appliance Installation
8 pages
B1 31 Handouts Part 2
No ratings yet
B1 31 Handouts Part 2
101 pages
B1 30 - Handouts - Part - 1
No ratings yet
B1 30 - Handouts - Part - 1
59 pages
A1 30 - ATAYA - IT - Financial - Management
No ratings yet
A1 30 - ATAYA - IT - Financial - Management
87 pages
Dimensional Data Modeling With Databricks
No ratings yet
Dimensional Data Modeling With Databricks
23 pages
Business Intelligence Interview Questions and Answer
No ratings yet
Business Intelligence Interview Questions and Answer
12 pages
Data Warehousing Concepts & Architecture
No ratings yet
Data Warehousing Concepts & Architecture
37 pages
MasterCard Data Engineering
No ratings yet
MasterCard Data Engineering
17 pages
Star and Snowflake Schema Implementation
No ratings yet
Star and Snowflake Schema Implementation
6 pages
Understanding Data Modeling Basics
No ratings yet
Understanding Data Modeling Basics
57 pages
Data Warehousing Basics: Hend Saad
No ratings yet
Data Warehousing Basics: Hend Saad
31 pages
Cs701 Data Warehouse and Data Mining
No ratings yet
Cs701 Data Warehouse and Data Mining
23 pages
DWM
No ratings yet
DWM
64 pages
Data Architect Interview Questions
No ratings yet
Data Architect Interview Questions
66 pages
Data Warehousing INTERVIEW QUESTION
No ratings yet
Data Warehousing INTERVIEW QUESTION
17 pages
DWH-BI Engineer Interview Questions Guide
No ratings yet
DWH-BI Engineer Interview Questions Guide
4 pages
Assignment 4-1
100% (2)
Assignment 4-1
27 pages
Lovely Professional Big Data
No ratings yet
Lovely Professional Big Data
254 pages
CertyIQ DP-900 Web - Question Bank
No ratings yet
CertyIQ DP-900 Web - Question Bank
70 pages
CS1004 DWM 2marks 2013
No ratings yet
CS1004 DWM 2marks 2013
22 pages
Data Warehousing and Management Prelim Activity
No ratings yet
Data Warehousing and Management Prelim Activity
12 pages
Power BI Data Modeling Guide
No ratings yet
Power BI Data Modeling Guide
47 pages
DWDM (BCS058) 2nd UNIT NOTES
No ratings yet
DWDM (BCS058) 2nd UNIT NOTES
39 pages
Data Engineering Portfolio Ideas
No ratings yet
Data Engineering Portfolio Ideas
9 pages
Business Intelligence Essentials
No ratings yet
Business Intelligence Essentials
21 pages
3 - Business Analysis in Data Mining - L6 - 7 - 8 - 9 - 10
No ratings yet
3 - Business Analysis in Data Mining - L6 - 7 - 8 - 9 - 10
39 pages
CCS341 - Data Warehousing - Unit 4 Notes
0% (1)
CCS341 - Data Warehousing - Unit 4 Notes
19 pages
CCS341 Data Warehousing Unit 4 Notes
No ratings yet
CCS341 Data Warehousing Unit 4 Notes
19 pages
Data Warehousing Exam Guide
No ratings yet
Data Warehousing Exam Guide
10 pages
What Is The Purpose of Factless Fact Table
No ratings yet
What Is The Purpose of Factless Fact Table
11 pages
Understanding Dimensional Modeling Basics
No ratings yet
Understanding Dimensional Modeling Basics
84 pages
Data Engineering Lab
No ratings yet
Data Engineering Lab
55 pages
Data Mining Note Sixth Semester ..
No ratings yet
Data Mining Note Sixth Semester ..
79 pages
Star and Snowflake Schema in Data Warehouse With Model Examples
No ratings yet
Star and Snowflake Schema in Data Warehouse With Model Examples
4 pages