0% found this document useful (0 votes)
36 views46 pages

Optimize Your Business Data Warehouse

Uploaded by

amni10ab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views46 pages

Optimize Your Business Data Warehouse

Uploaded by

amni10ab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Harness Business Data

Build a fully-optimized business data warehouse in five courses.

https://www.coursera.org/specializations/data-warehousing

À propos de cette Spécialisation

Evaluate business needs; design a data warehouse; and integrate, and visualize
data using dashboards and visual analytics.

This Specialization covers data architecture skills that are increasingly critical across a broad
range of technology fields. You’ll learn the basics of structured data modeling, gain practical
SQL coding experience, and develop an in-depth understanding of data warehouse design and
data manipulation. You’ll have the opportunity to work with large data sets in a data
warehouse environment to create dashboards and Visual Analytics. You will use of
MicroStrategy, a leading BI tool, OLAP (online analytical processing) and Visual Insights
capabilities to create dashboards and Visual Analytics. In the final Capstone Project, you’ll
apply your skills to build a small, basic data warehouse, populate it with data, and create
dashboards and other visualizations to analyze and communicate the data to a broad audience.

University of Colorado

COURS 2 : Data Warehouse Concepts,


Design, and Data Integration
https://www.coursera.org/learn/dwdesign
WEEK 1

Data Warehouse Concepts and Architectures


f) Data warehouse architectures and maturity
* architecture = refers to the organization of components to support specified goals.
For data warehouses, business goals drive technology choices. So, architecture is primarily a business
issue, not a technology issue.

* The scope and integration level are important factors in determining an appropriate
architecture.
- Scope refers to the breath of an organization supported by a data warehouse. Scope can be
measured in various ways such as the number of data sources used the number of source systems
providing inputs, or number of organizational units providing inputs or using a data warehouse.
- Integration level refers to several data quality indicators across data sources. Integration level
involves completeness, consistency, conformity, and duplication across data sources not within
individual data sources.
* Two opposing approaches provide limits on architectures for data warehouses.
- The top down approach, known as the enterprise data warehouse, provides a single data
warehouse for an organization. Through logical centralization, the top down approach is a high level
integration in large scope.

--> top-down architecture with three tiers, staging area, data warehouse and data marts.
- The staging area provides temporary storage of transformed data before loading into
the data warehouse. The staging area is particularly useful for data warehouses with a
large number of operational databases and external data sources requiring complex
data transformations.
- enterprise data model known as an EDM. The EDM defines a structure of the data
warehouse in details required to access operational databases and external data
sources. Departmental users generally need access to small portions of the data
warehouse instead of the entire data warehouse.
- To provide them with faster access, while isolating them from data needed by other
user groups, smaller data warehouses called data marts are often used. Data marts act
as the interface between business analysts and the enterprise data warehouse, storing a
derived part of the data warehouse, and refreshing this derived part on a periodic basis.
- The bottom up approach provides independent data marts with lower integration levels. A data
mart is a data warehouse with a small scope justified by tangible benefits from critical business
reporting needs and lower development risks from smaller project sizes.
- In a bottom-up architecture, important business reporting requirements
spanning operational databases drive the development of independent data marts. Each
data mart is relatively small in scope supporting several organizational units and data
sources. The integration is limited because of the small scope of data marts. The set of
data marts may evolve into a large data warehouse if the organization can justify the
expense of building an enterprised data model.
One approach, know as a data mart bus approach, provides an additional layer or data
integration, to transform selected data from independent data marts. The level of data
integration in the data mart bus approach is much less than the level in an enterprise
data warehouse.

* The data warehouse maturity model provides a framework to view an organizations progress and
to guide investment decisions.
h) Employment opportunities
i) Course software requirements
For this course, you need to install a database server (Oracle or MySQL), Pivot4J plugin for the
Pentaho Business Analytics Platform, and Pentaho Data Integration. For both Pentaho software, you
should install the community editions. You will use Pivot4J in module 2 and Pentaho Data Integration
in module 5. The database server is used in module 5 as part of using Pentaho Data Integration. You
should also install a database client to create and populate the tables used in module 5.

WEEK 2

Multidimensional Data Representation and Manipulation


Data cube representation
* A data cube provides a multidimensional range of factors as dimensions in quantitative variables in
the cells of a data cube.
* A dimension is a subject label for a row or column. For example, a dimension may be city size or
type of health plan offered.
* A data cube is multi-dimensional. It is not limited to two or three dimensions.
Data cube operators
Using the slice operator, a business analyst can focus on the subset of dimensions, replacing the
dimension with a single value.
For example, this slice operation replaces the time dimension with the single date January 1,
2013. This slice operation only shows the front face of the data cube, with the first value of depth or
z-axis, January 1, 2013. A variation of the slice operator allows the decision maker to summarize
across members, rather than focus on just one member.

The slice summarize operator replaces one or more dimensions with summary calculations. The
summary calculation often indicates a total value across members or the central tendency of the
dimension, such as the average or median value.
The dice operator replaces a dimension with a subset of values of the dimension.
This example shows the result of the dice operation to display sales for the USA states of Utah,
Arizona, and Colorado for January 1, 2013.
A dice operation typically follows a slice operation and returns a subset of the cells displayed in the
preceding slice.

* The drill-down operator allows analysts to navigate from a broader level to a narrow or more
detailed level, such as navigating from product group to individual products.
Roll-up is the opposite of drill-down. Roll-up involves moving from a more detailed level to a broader
level of a hierarchical dimension.

* The pivot operator supports rearrangement of the dimensions in a data cube.


For example, the position of the product in the location dimensions an be reversed in the sales data
cube so that the product appears in the columns and location on the rows.
The pivot operator allows the dimensions to be presented in the most appealing visual order.
Overview of Microsoft MDX
Microsoft Multidimensional Expressions language, MDX
MDX is the foundation for the Microsoft SQL Server Analysis Service and Microsoft Excel Pivot
Tables.
Prominent open source projects using MDX are JPivot, Pivot4J and the Pentaho Business Analytics
platform.

An MDX cube consist of dimensions and


measures.

This cube structure shows two measures,


- quantity and
- sales,

along with five dimensions,


- markets,
- customers,
- product,
- time and
- order status.

In MDX, dimensions are composed of attributes.


For example, the product dimension has
attributes
- line,
- vendor, and
- product.
* Attributes can be independent, or related in a hierarchy.
In this MDX cube example, attributes are hierarchically related.

* MDX has some unique terminology for data cubes.


- A tuple is a combination of members, one from each dimension. A tuple identifies a cell.
- Axis refers to a dimension from a source data cube used in a query. In MDX, cubes typically have
two axes, rows and columns, although more axes can be used.
- A slicer refers to a tuple in the result of an MDX query expression.

Microsoft MDX Statements


Although the MDX select statement appear similar to SQL select statement, there's a fundamental
difference. The result of a SQL select statement is a table. One of the result of the MDX statement is
a data cube.
Overview of Pivot4J
Pivot4J uses a pivot table interface to support data cubes in Microsoft MDX.
This pivot table contains the product label as the row heading with member values shown in
individual rows. The pivot table displays order status as the column heading, with members listed on
each column. The sales measure is also displayed below the order status members on the columns.

Pivot tables have become the standard interface for data cubes.
Overview of WebPivotTable

Pivot4J software demonstration


WEEK 3

Data Warehouse Design Practices and Methodologies


Relational database concepts for multidimensional data
A multidimensional representation of a data warehouse involves dimensions with attributes on the
access of the data cube it measures in cells.
A dimensional model for reasonable size data warehouse typically involves multiple data cubes,
sometimes sharing dimensions and measures.

A table design for data warehouse typically consists of dimension tables, connected to fact tables,
and one-to-many relationships. The attributes of a dimension map to columns in a dimension
table. Fact tables contain measures as well as foreign keys to dimension tables.

A common error early in the design process is to omit dimensions related to a fact table.
The size of a fact table can be estimated using the cardinality of each dimension discounted by the
rate of sparsity. Sparsity increases as the number of dimensions increases and the cardinality of a
dimension increases.

example :

* Fact tables are classified based on the types of measure stored.


- A transaction table contains additive measures.
--> Typical transaction tables store measures about sales, web activity, and purchases.
- A snapshot table provides a periodic view of an asset level.
--> Typical snapshot tables store semi-additive measures about inventory levels, accounts
receivable balances, and accounts payable balances.
- A factless table records event occurrences, such as attendance, room reservations, and hiring.
--> Typically, factless tables contain foreign keys without any measures.
* fact table may be a combination of these types.
- The names in black are dimensions, and
- the names in red are measures.

Table design patterns


The traditional schema pattern for a data warehouse is known as a star schema, consisting of one
fact table surrounded by dimension tables in one-to-many relationships.

The star schema can represent one data cube. This ERD consists of four dimension entity types, Item,
Customer, Store, and TimeDim, along with one transaction fact entity type called Sales.
When converted to a table design, the Sales table has foreign keys to each dimension table, Item,
Customer, Store, and TimeDim.

* For related business processes that share some of the dimension tables, a star schema can be
extended into a constellation schema with multiple fact entity types, as shown in this ERD.
- When converted to a table design, the inventory entity type becomes a fact table and one of many
relationships become foreign keys in the fact table.
- The inventory entity type contains a number of measures, including the quantity on hand of an
item, the cost of an item and a quantity returned.
- All dimension tables are shared among both fact tables, except for the supplier and customer
tables.
The constellation pattern is important because it shows shared dimensions to represent multiple
data cubes.

* A snowflake schema has multiple levels of dimension tables related to one or more fact tables.
In this CRD the store dimension has been split into two entity types. Store and division along with a
one to many relationship from division to store.
You should consider the snowflake schema instead of the star schema for small dimension tables
that are not fully normalized.

The Store table in the previously shown star schema is not fully normalized because division id
determines division name and division manager.
* Time representation is crucial for data warehouses, because most data warehouse queries use
time in conditions.
The principle usage of time is to record to occurrence of facts. The simplest representation is a time
stamp data type for a column and a fact table.
In place of a timestamp column, many data warehouses use a foreign key to a time dimension table,
as shown in previous examples in this lesson.

Summarizability patterns for dimension tables


Summary computations for measures occur in drill down and roll up operations on a data cube, and
joint operations combining fact and dimension tables.
Violations of summarizability conditions, the track from the usability of data warehouses for
summary queries.
The most serious summarizability violations produce erroneous results.

Summarizability patterns involve the relationship among dimension levels and the relationships from
dimension to fact tables.

* In this example, the parent college level drills down to the department child level with a smaller
total. The enrollment in the business college is omitted in the department level because the business
college does not have departments.
* Roll up incompleteness is the reverse of drill down incompleteness.
Rolling up from a child, that is a finer level, to a parent, that is a coarser level, shows a smaller total
indicating that measured values attributed to child members have not been allocated to parent
members.
In this example, the product child level rolls up to the category parent level with a smaller total.

The values of minimum and maximum cardinalities determine schema patterns for the three
dimension summarizability problems.
- The drilll-down incomplete problem involves a minimum cardinality of zero for the parent
entity type as shown in ERDA.
- The roll up incomplete problem involves a child's minimum cardinality of zero as shown in
ERDB.

Summarizability patterns for dimension-fact relationships


Incomplete dimension-fact relationships involve fact entities that do not have a related parent
entity in a dimension-fact relationship.
Inconsistencies caused by incomplete relationships are manifest in joint operations with summary
calculations.
The non strict dimension fact relationship problem involves double counting measure values due to a
many to many relationship between dimension and fact tables.
Mini case for data warehouse design
- Using specifications of data source and business needs, we will first specify dimensions and
measures including important properties.
- Then you will determine the most appropriate grain and make relative size calculations about the
grain.
- You will create a table design for the data warehouse that supports the dimensional model and data
sources.
- You'll then identify summarizability problems, and suggest resolutions.
- Finally, you'll map the data sources and populate data warehouse tables using sample data from the
data sources.
Data warehouse design methodologies
A design methodology is a vital tool in data warehouse development, combining phases, labor and
automation, and project management.

Without an appropriate methodology, the best efforts will likely fail to produce a data warehouse
with high value for an organization.
Artifacts of a data warehouse design are dimensional models, a schema design using patterns
presented in other lessons, data integration procedures and data marks to support business analysis.

* The demand-driven data warehouse design methodology, also know as the requirements-driven
approach, first proposed by Kimball in 1988, is one of the earliest data warehouse design
methodologies.
The demand-driven methodology has three phases for identifying data marts and under the subsets
of user requirements, building a matrix-related data marks and dimensions, and designing fact
tables.

* The supply driven methodology emphasizes the analysis of existing data sources.
Entities in ERDs of existing data sources are analyzed to provide a starting point for the data
warehouse design.
The supply driven methodology has three phases, to Classify Entities, Refine Dimensions and to
Refine the Schema.

* The hybrid data warehouse design methodology proposed by in 2001 combines a demand and
supply methodologies.

The Hybrid Methodology involves a demand-driven stage, a supply-driven stage, and then a third
stage to integrate the demand in supply driven stages.

The demand and supply stages could be done independently as shown in this diagram.

The overall emphasis in the hybrid approach is to balance demand and supply aspects of data
warehouse design, possibly aided by automated tools. The demand driven stage collects
requirements using the goal, question, metrics, or GQM approach.
The GQM approach, provides some informal guidelines to define measures and dimensions from the
goals.

The second step of the hybrid methodology, involves analysis of existing ERDs. The methodology
provides guidelines to identify fact and dimension tables and existing ERDs. Potential fact tables are
identified based on a number of additive attributes. Dimension tables are involved in one of many
relationships with fact tables.

The third step of the hybrid methodology integrates the dimensional model in the demand stage and
the star schema in the supply stage.

WEEK 4
Data Integration Concepts, Processes, and Techniques
Concepts of data integration processes
Data integration adds value to disparate data sources that contribute to enterprise data
warehouses. The primary goal of data integration is to provide a single source of truth for decision-
making.
Refresh processing is about updating
- The Preparation Phase manipulates change data from individual source systems.
- Extraction retrieves data from individual data sources.
- Transportation moves extracted data to a staging area.
- Cleaning involves a variety of tasks to standardize and improve the quality of the extracted
data.
- Auditing records results of the cleaning process, performing completeness
and reasonableness checks and handling exceptions.
- The integration phase merges separate clean sources into one source.
- Merging can involve removal of inconsistencies among the source data.
- Auditing records results of the merging process, performing completeness
and reasonableness checks, and handling exceptions.
- The update phase involves propagating the integrated change data to various parts of the data
warehouse.
After propagation, notification can be sent to user groups and administrators. In addition to
periodic refreshment, data integration involves initial population of a data warehouse.
Change data concepts
emphasizes change data used in data integration processes to populate and refresh your data
warehouse.

* 4 types of change data:

- Source system requirements involve modifications to source systems to acquire change data.
Typical changes to source systems are new columns, such as timestamps required
for queryable change data, and trigger code required for cooperative change data.
- Processing level involves resource consumption and development required for data
integration procedures.
Logs and snapshot change data involve substantial processing.

1) Cooperative change data involve notification from a source system about changes. The notification
typically occurs at transaction time using a trigger.

A trigger is a rule executed by a DBMS when an event occurs, such as inserting a new row.
A trigger involves software development and execution as part of a source system. Cooperative
change data can be input immediately into a data warehouse. Or placed in a queue or staging area
for later processing, possibly with other changes.
2) Logged change data involve files that record changes or other user activity.

For example, a transaction log contains every change made by transaction, and a web log contains
page access histories called click streams by web site visitors.

3) queryable change data come directly from a data source via a query.

Queryable change data require time stamping in a data source. Since few data sources contain time
stamps for all data, queryable change data usually are augmented with other kinds of change data.

Queryable change data are most applicable for fact tables using columns such as order date,
shipment date and hire date, that are stored at operational data sources.

4) Snapshot change data involve periodic dumps of source data. To derive change data, a difference
operation uses the two most recent snapshots. The result of a difference operation is called a delta.

Generating a delta involves comparing source files to identify new rows, changed rows, and deleted
rows
Data cleaning tasks
Parsing decomposes complex objects, using text, into their constituent parts.
For data integration, parsing is important for decomposing multi-purpose text data into individual
fields.
Pattern matching with regular expressions
Regular expressions specify patterns for parsing text fields with multiple components, common in
data integration tasks.

Regular expression tools are widely supported in data integration tools, DBMSs, application
programming interfaces in testing web sites.

(..........
..................
..................)

Matching and consolidation


* application for entity matching.

Entity matching identifies duplicate records in two or more data sources when no common reliable
identifier exists.
The classic application involves identification of duplicate customers and data sources from different
firms.
Because a common identifier does not exist, duplicates must be identified from other common fields
such as names, address components, phone numbers and ages.
Because these common fields come from different data sources, inconsistency and non standard
representations may exist, complicating the matching process.

Matched entities can be merged or linked. If merging two entities, sometimes old data from one
source is discarded. In addition, new fields can be added to obtain data unique from each data
source. Linking maintains separate entities, but notes relationships.

Quasi identifiers and distance functions for entity matching


entity matching algorithms use quasi identifiers to compensate for missing common
identifiers. Quasi identifiers can be almost unique in combination. --> In a study published in 2000,
Sweeney demonstrated that 87% of the US population can be identified by a combination of gender,
birth date, and postal code.

Entity matching approaches use distance functions to determine if quasi identifiers in two entities
indicate the same entity.
In a geometric sense, distance is the amount of space between two points.
For entity matching a point is a combination of values, one from each quasi identifier. Nurmeri-quasi
identifiers are easy to compare, but text quasi identifiers can be difficult to compare.
WEEK 5
Architectures, Features, and Details of Data Integration Tools
Architectures and marketplace
To support the complexity of data integration processes to populate and refresh a data warehouse,
software products for data integration have been developed.

* Data integration tools support two architectures.


a) The extraction, transformation, and loading architecture, abbreviated ETL, performs
transformation before loading, as shown in this diagram.
- Extraction, data cleaning, and other integration tasks are performed by a transformation engine
before loading into the target data warehouse tables.
- The transformation engine is independent of the DBMS for data warehouse tables.

b) The extraction, loading, and transformation architecture, abbreviated at ELT, uses a relational
DBMS to perform transformations after extraction loading.
- The ELT Architecture is designed to utilize high performance features of the enterprise DBMSes.
- ETL architecture supporters emphasize DBMS independence of ETL engines, While ELT architecture
supporters emphasize superior optimization technology in relational DBMS engines.
- ETL architectures can usually support more complex operations in a single transformation than ELT
architectures, but ELT architecture may use less network bandwidth.
- High execution indicates large firm size and resources to support current and future products.
- High vision indicates a firm with broad integrated product offerings.

Common features of data integration tools


This diagram divides features of data integration tools into essential and secondary categories.
- Integrated development environments, IDEs, support complex software projects with a source
code editor, visual specification tools, debugger, and code generator packaged in a convenient
graphical interface

- Workflow and component specifications are the most salient parts of an IDE for data integration.
- A repository is a design database for all features of a data integration tool.
The repository stores design objects and relationships, maintains dependencies among design
objects, and provides documentation, with predefined reports and convenient formatting.

- Job management supports scheduling and monitoring of jobs for executing data integration
workflows.
Scheduling typically handles base schedules, repetition, conditional execution, and stepping through
a workflow to identify problems.
Monitoring provides logging of events, performance alerts, and reports.

- Data profiling helps a data warehouse administrator and data owners understand and approve
data quality and data sources. A data profiling tool can reduce unexpected delays and surprises in
populating and refreshing a data warehouse.

Talend Open Studio


Talend Open Studio, a prominent open source data integration product.

Pentaho Data Integration


Pentaho provides a unified platform for data integration, business analytics, and big data. Like
Talend, Pentaho uses the open core model, with an open source community edition and proprietary
extensions and commercial additions. Pentaho offers commercial products for data integration,
business analytics, and big data analytics.

Course conclusion
COURS 3 : Relational Database Support for
Data Warehouses
https://www.coursera.org/learn/dwrelational
WEEK 1

DBMS Extensions and Example Data Warehouses


DBMS extensions

* The failure of operational databases to support higher level decision making is a combination of
inadequacy of database technology and limitations in employment of databases.

- Performance limitation
Organizations and DBS vendors discovered a single database could not be configured to provide
adequate performance for both transaction processing and business intelligence processing.
- Lack of integration
Organizations discovered that the lack of integration among operational databases hindered higher
level decision making.
This lack of integration was thought a designed failure, so operational databases primarily support
transaction processing, not business intelligence processing.
- Missing data management feature
Organizations experience unexpected development difficulties and poor performance for business
intelligence applications.
These difficulties were unexpected. Because the promise of non procedural specifications, the SQL,
and optimizing compilers.

*
Relational database schema patterns
* The traditional schema pattern for data warehouse is known as a star schema,
--> consisting of one fact entity type per table, surrounded by the dimension entity types or tables in
one-to-many relationships. The star schema represents a single data cube.

This ERD consists of four dimension entity types. Item, Customer, Store, and TimeDim, along with
one transaction fact entity type called, Sales. When converted to a table design, the Sales table's
foreign keys to each dimension table, Item, Customer, Store, and TimeDim.

In some designs, the fact entity type depends on the related dimension entity type's first primary
key. Since fact tables can have many relationships, it is generally preferred to have an artificial
identifier rather than a large combined primary key.
In this example, SalesNo is an artificial identifier for the sales entity type.
* For related business processes that share some of the dimensions, a star schema can be extended
into a constellation schema with multiple fact entity types.

This constellation ERD diagram contains two fact entity types, Sales and Inventory, along with five
dimension and two types, shared among the fact entity types.

The constellation pattern is important, because it shows shared dimensions to represent multiple
data cubes. A data warehouse for medium size business may have many data cubes.
For a complex constellation diagram, a matrix provides a convenient mapping of dimensions and
facts. This matrix shows a mapping of the simplified schema diagram shown previously.

Item and TimeDim are shared in all three schemas.

Most dimensions appear in two schemas. Shared dimensions are conformed or standardized across
schemas.
Supplier and Store are used in only one schema each.

* You should consider the snowflake schema instead of the star schema for small dimension tables
that are not fully normalized.
- The store table in the previously shown star schema is not fully normalized, because Division ID, the
term is Division Name and Division Manager. Since the store table is relatively small, query
performance will not suffer much with the need to join the division table in a snowflake design.
- For large dimension tables, such as customer, however, query performance may suffer with extra
join operations required in a snowflake design.
* The inventory data warehouse supports business intelligence about inventory cycles. Inventory
bought, sold, consumed, and produced, is the heart of any manufacturing or distribution company.
- Inventory transactions are frequent and common.
- The volume and significance of inventory transactions makes them important in a data warehouse
design.
- The work order, sales, and purchase life cycles, affect the perpetual inventory balance as shown in
this diagram.
Before starting on assignments in Module's 2, 3, and 4, you need to create and populate the
inventory data warehouse tables. The course website contains files with a create table and search
statements for each table

Course software requirements


For this course, you need to install the Oracle database server. You should also install the SQL
developer client to create and populate the tables used in this course.
- Overview of software requirements

- Overview of database software installation

- Oracle installation notes

- Making connections to a local Oracle database

- SQL statements for Store Sales tables

https://d3c33hcgiwev3.cloudfront.net/
_4b4ec83a9aa2fb6b0d7e85677d82dcac_SalesDWStatements.txt?
Expires=1464739200&Signature=g9QqP02gb-RDhz5AfRwhIki8RcEGxCJje-
4yR4nJebN34eN2Txnn87bzCYdkrrIFMLEPIzrznkoQNL-
RbWyEQ7vGMu75j5zj7F4dqRoUjpQYN9hTlxi7Btn29sj290brQPv9QrirBU0MPxUluqY9ypxxJjQLqAUJyO
5xp8a6-08_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A

- SQL statements for Inventory tables

The Inventory data warehouse background

InventoryDW.doc
The CREATE and INSERT statements for the Inventory data warehouse tables.

InventoryStarCREATEStatementOracle.txt

https://d3c33hcgiwev3.cloudfront.net/
_54e3cd972f071cc3211121f55ae1dfdf_InventoryStarCREATEStatementOracle.txt?
Expires=1464739200&Signature=UB4CCfmOi-tb~ujJNPeZSQqsplEUxJHvM-mY-TmisdMygg-
FUj1YKkE2c1ZJPv143kUrDl0TxZQR7EHlUfLrTccCBg1cyt47gB-
c4Xh~vuf5t6gRpKSCTthagxzMzm1Nn6e5i0F6oqfI~rJrw1yq2lrcVqzNedkVEaVy5V5rw-I_&Key-
Pair-Id=APKAJLTNE6QMUY6HBC5A

InventoryStarINSERTStatementOracle.txt

https://d3c33hcgiwev3.cloudfront.net/
_1e3a5ba95f42164c9e93d521fab08569_InventoryStarINSERTStatementOracle.txt?
Expires=1464739200&Signature=YxM8hwZBgE~4Kfy3llaGU81B9L-dF79-
6JTLhBlAobj8vvOSg3X2uUceihMrz1a6~KWfdknnOXS9VorP9GFaY6xt3xcnWKpj4JeVuFgS4E5XlO
rMa-LceSiCm0sOOUup4IFdz~E9zgSh3jm5aF6FLquPPNqJa1bszMN0Hm1W8Vk_&Key-Pair-
Id=APKAJLTNE6QMUY6HBC5A

WEEK 2

SQL Subtotal Operators


GROUP BY clause review
The group by clause produces row summaries in which each row summary replaces collections or
groups of rows.
Each row summary contains grouping columns and summary values calculated with aggregate
functions.

An aggregate function calculates one value per set of rows.


The standard aggregate functions in SQL are min, max, count, sum and avg for average. Many other
aggregate functions are typically provided depending on the dv mess.
* example from previous data model
SQL CUBE operator

The CUBE operator generates a complete set of subtotals for collection of columns.
- Is only appropriate for independent columns, usually less than four columns.
- The column specification is order independent due to the nature of the operator generated all
possible subtotals. You can understand results of a cube operation by comparison to the normal
group by results.

- The first seven rows are identical in both results.


- Dash in the cube results indicates subtotal rows. For example the eighth row, the CA for
California and the dash, shows the sum of sales, 175, for California across all months.
- This cube result group beyond two columns stated month contains three sets of subtotal
rows.
Subtotals for state,
subtotals for month,
and the grand total.

* example from previous data model


SQL ROLLUP operator
The ROLLUP operator, generates a partial set of subtotals for a collection of columns. It is
appropriate for hierarchically related columns, that can be part of a hierarchical dimension.

You can understand the results of a ROLLUP operation, they comparison to the normal group by
results.
- They only syntax difference in the abbreviated statements, is the ROLLUP operator in a group
by clause. Complete select statements, appear later in the lesson.
- The ROLLUP result contains additional rows for sub totals.
- The first 6 rows are identical in both results. Dashes in the ROLLUP results, indicate sub total
rows.
- For example, the seventh row shows the sum of sales as 325 in 2012, across all months.
- The last row with two dashes, shows the grand total of 675.
- This ROLLUP result, grouping on 2 columns, year and month, contains 2 sets of subtotal rows,
subtotals for year and the grand total.
SQL GROUPING SETS operator
The grouping sets operator provides complete flexibility to generate any set of subtotals, even the
normal group by results are not generated by default.
- To achieve this flexibility you must provide explicit specification of subtotal groups, more
tedious than the specification for the cube and roll up operators.
- The explicit specification for the grouping sets operator has some similarity with writing a
union query to generate subtotals. For both the union query and grouping sets operator, you
must know the collections of subtotals to generate.

This example shows a select statement using a grouping sets operator.

- The grouping sets operator explicitly specifies the complete set of sub totals.

- The new syntax in this statement is the grouping sets key words with a list of subtotal groups
after the group by clause. The grouping sets operation contains four subtotal groups. The
combination of store, zips, and time, month, store/zip by itself, time/month by itself and the
grand total specify it as empty parenthesis.

Variations of subtotal operators

.......... (pas envie)


WEEK 3

SQL Analytic Functions


Processing Model and Basic Syntax

Extended Syntax and Ranking Functions

Window Comparisons I

Window Comparisons II

Functions for Ratio Comparisons

Quiz and Assignment

WEEK 4

Materialized View Processing and Design


Background on Traditional Views

Materialized view definition and processing

Query Rewriting Rules

Query Rewriting Examples

Oracle Tools for Data Integration

Quiz and Assignment

WEEK 5

Physical Design and Governance


Storage Architectures

Scalable Parallel Processing Approaches

Big data issues

Data Governance

Quiz and Assignment

Closing
COURS 4 : Business Intelligence Concepts,
Tools, and Applications
https://www.coursera.org/learn/business-intelligence-tools
WEEK 1 :

Big Data: Why and Where


a) ...

WEEK 1

Decision Making and Decision Support Systems

Course Overview
Lesson 1.1: Overview of Decision Making

Lesson 1.2: Conceptual Foundations of Decision Making

Lesson 1.3: Decision Support Systems

Lesson 1. 4: DSS in Practice

Module 1: Software Requirements, Practice Quiz, and Assignment

WEEK 2

Business Intelligence Concepts and Platform Capabilities


Lesson 2.1: BI Concepts

Lesson 2.2 : BI Platform Capabilities

Lesson 2. 3: Business Reporting

Lesson 2. 4: BI OLAP Styles

Module 2: Practice Quiz, Peer Review Assignment, and Graded Quiz

Graded Quiz #1: Please complete this GRADED quiz, which covers content from Modules 1
and 2

WEEK 3

Data Visualization and Dashboard Design


Lesson 3.1: Data Visualization
Lesson 3. 2: Data Visualization Guidelines and Pitfalls

Lesson 3. 3: Performance Dashboards

Lesson 3. 4: Dashboard Design Guidelines and Pitfalls

Module 3: Practice Quiz and Peer Review Assignment

WEEK 4

Business Performance Management Systems


Lesson 4. 1: Business Performance Management

Lesson 4.2 : Performance Measurement System

Lesson 4. 3 : Balanced Scorecards and Six Sigma

Lesson 4. 4: Business Analytics

Module 4: Practice Quiz and Assignment

WEEK 5

BI Maturity, Strategy, and Summative Project


Lesson 5.1: BI Maturity

Lesson 5.2: BI Strategy

Graded Quiz #2: Please complete this GRADED quiz, which covers content from Modules 3,
4 and 5.

Closing Video

You might also like