0% found this document useful (0 votes)
24 views18 pages

Analyzing Dependability Data with AMBER

The AMBER Project aims to assess, measure, and benchmark resilience in computer systems, supported by the European Commission. It focuses on creating a data repository for dependability data, facilitating analysis, comparison, and sharing of results among researchers. The tutorial presented at DEPEND 2009 outlines the challenges in data analysis and the objectives of the AMBER Data Repository, emphasizing the need for standardized tools and methodologies.

Uploaded by

P B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views18 pages

Analyzing Dependability Data with AMBER

The AMBER Project aims to assess, measure, and benchmark resilience in computer systems, supported by the European Commission. It focuses on creating a data repository for dependability data, facilitating analysis, comparison, and sharing of results among researchers. The tutorial presented at DEPEND 2009 outlines the challenges in data analysis and the objectives of the AMBER Data Repository, emphasizing the need for standardized tools and methodologies.

Uploaded by

P B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Marco Vieira, University of Coimbra, Portugal

Tutorial The AMBER Project

• Assessing, Measuring and Benchmarking


Resilience in computer systems and
Using the AMBER Data Repository components (AMBER)
to Analyze, Share and Cross-exploit • Coordination Action supported by the
Dependability Data European Commission in the 7th FP

Marco Vieira • Coordinating and advancing research in


[email protected] resilience measurement and benchmarking in
University of Coimbra, Portugal computer systems and infrastructures
The Second International Conference on Dependability (DEPEND 2009)
Athens/Glyfada, Greece, June 18,2009

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 2

Current challenges AMBER objectives

• Quality of measurements • State-of-the art survey


• Integration of the human and technical • Research agenda
components of the analysis
• Data repository
• Dynamic and adaptive systems and networks
• Others:
• Integration with the development processes – Dissemination events (workshops, panels, etc)
– Benchmarking tools
– Training material

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 3 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 4

This Tutorial… Problems

• How to analyze the usually large amount of


raw data produced in dependability evaluation
Learn how to use the experiments?
AMBER Data Repository • How to compare results from different
experiments or results of similar experiments
to analyze and share data across different systems?
– Different and incompatible tools, data formats, and
from dependability setup details…
evaluation experiments • How to share raw experimental results among
research teams?
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 5 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 6

DEPEND 2009 1
Marco Vieira, University of Coimbra, Portugal

Current situation ADR Vision and objectives

• The situation today is not good!!! • Vision


• Spreadsheets and other specific tools to – Become a worldwide repository for
dependability related data
analyze results
– Not standard and difficult to build • Key objectives:
– Provide state-of-the-art data analysis
• Difficult to compare data and generalize
– Allow data comparison and cross-exploitation
conclusions
– Facilitate worldwide data sharing and
• Researchers share final results and conclusions dissemination
– Papers, mainly • Potential tool to increase the impact of
– Raw data is not shared research
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 7 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 8

Data analysis approach Outline

• Repository to analyze, compare, and share 1. Business Intelligence


results
• Use a business intelligence approach: 2. Data Warehousing & OLAP
– Data warehouse to store data
– On-Line Analytical Processing (OLAP) to analyze
data 3. Using DW to analyze dependability related data
– Data mining algorithms to identify (unknown)
phenomena in the data
4. The AMBER Data Repository
– Information retrieval for data in textual formats
• Adopt the same life cycle of BI data
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 9 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 10

What is Business Intelligence?

• Business Intelligence (BI):


– Getting the right information, to the right
decision makers, at the right time
• BI is an enterprise-wide platform that
1. Business Intelligence supports, data gathering, reporting, analysis
and decision making
• BI is meant to:
– Fact-based decision making
– “Single version of the truth”
• BI includes reporting and analytics
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 12

DEPEND 2009 2
Marco Vieira, University of Coimbra, Portugal

Five classic BI questions Typical BI technologies

• ETL Tools (Extract, Transform, and Load)

• What happened?
• Repositories
Past – Data Warehouse
• What is happening?
• Why did it happen? Present • Analytical tools
• What will happen? – Reporting and querying
Future
– OLAP
• What do I want to happen?
– Data mining

• Information retrieval

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 13 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 14

Many proprietary products Some open source/free producs

ACE*COMM Microsoft SAS Institute • Eclipse BIRT Project:.


Microsoft Analysis Services
Ab Initio
PerformancePoint Server 2007
Siebel Systems • Freereporting.com:
Actuate
Proclarity Spotfire (now Tibco) • JasperSoft:
ComArch Oracle Corporation StatSoft • OpenI:
CyberQuery Hyperion Solutions
Dimensional Insight Corporation SPSS • Palo (OLAP database):
IBM Panorama Software Telerik Reporting
Applix Pentaho • Pentaho:
Pervasive Teradata
Cognos
Pilot Software, Inc.
• RapidMiner
InetSoft Thomson Data Analyzer
Informatica
PRELYTIS • SpagoBI:
Prospero Business Suite
Information Builders Qliktech • Weka
LogiXML SAP Business Inf. Warehouse
LucidEra Business Objects
OutlookSoft
• Some products from big companies can be used freely
MicroStrategy

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 15 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 16

What is a Data Warehouse?

• Big database that stores data for decision


support
• Built from the operational data collected from
transactional DB and other operational systems
2. Data Warehousing Operational DB

& OLAP & other systems Data Warehouse


Users

Users

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 18

DEPEND 2009 3
Marco Vieira, University of Coimbra, Portugal

Basic DW components Data volume

Data warehouse Users • Less than 20 GBytes


Operational DB
(presentation servers) – Small dimension; runs in a PC
Ad hoc
queries • From 20 to 100 GBytes
Legacy systems
Reports
– Medium dimension; needs a powerful workstation
Data • From 100 Gbytes to 1 TBytes
Spreadsheets, Staging Specific
files, ...
Area
apps – Large dimension; needs a powerful server,
Models and
normally with parallel processing
other tools
• More than 1 TBytes
External sources
– Very large dimension; massive parallel processing

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 19 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 20

Some characteristics Temporal dependency

• Temporal dependency • The data is collected over time


– Do not represent a specific moment
• Non volatile – Represents the history

• Target oriented
• A temporal reference must be associated to all
• Data integration and consistency data in the database

• Designed for queries

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 21 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 22

Non volatile Target oriented

• The data in the DW is never updated • The data warehouse must only store data
relevant for decision support
• The DW stores historic data (historic memory)
collected from the operational databases • Many operational data (needed for everyday
management) is not relevant for the DW
• After being load (from the operational
databases) there is only one operation:
– Queries

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 23 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 24

DEPEND 2009 4
Marco Vieira, University of Coimbra, Portugal

Data integration and consistency Designed for queries

• In a operational environment the information • After being load the


may be stored in different locations using data never changes:
different representations – Only queries are The data must be stored
allowed in such a way that
improves performance
• That data must be integrated and made • DW stores a large
consistent before being load in the DW amount of data

Multidimensional view
Partial denormalization
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 25 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 26

Dimensional model The multidimensional model

• Typical model in operational databases: E/R • Facts stored in a multidimensional array


• The dimensional model follows a different • The dimensions are used to index the array
approach
– Stores the same data • Usually built using data from operational
databases Sales
– Data organization is user oriented
e
or
Lisbon 2
• Easy to understand St
Coimbra
• Very good performance for queries Milk

Product
Oil 5
• Data Warehouses built over complex E/R Sugar 3
models never succeed Coffee
Jan Feb Mar Apr
Date
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 27 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 28

Star model Facts

• The typical dimensional model is a star • Represent the business measures


structure with:
• The most useful facts are:
– A central table with facts
– Numbers
– Several dimensions tables describing the facts
Dimension 1 Dimension 3
– Additives
ID_dim 1 ID_dim 3
Facts Table
Attributes Attributes
. ID_dim 1 .
. ID_dim 2 .
. .
ID_dim 3
ID_dim 4
Fact 1
Dimension 2 Fact 2 Dimension 4
.
ID_dim 2 . ID_dim 4
.
Fact n
Attributes Attributes
. .
. .
. .
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 29 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 30

DEPEND 2009 5
Marco Vieira, University of Coimbra, Portugal

Facts table Dimensions

• Comprises several numeric attributes (facts) • Each dimension represents a business


and foreign keys to the dimensions parameter
• Normalized table – Time, clients, products, etc

• Relationships M:1 with the business • Represent a entry point for the analysis of the
dimensions facts

• Contains normally a large number of records • Represent different point-of-views for the
analysis of the facts
• Represents typically 95% of the space used
by the DW

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 31 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 32

Dimension tables Star schema example

• Strongly denormalized Stores


Time
– For performance ID_time
Day Store
• Dimensions have hierarchies Day_of_week
Week_of_year Sale
ID_store
Month Name
– Day  Month  Year  … Contain a large set of Trimester ID_time
ID_product
Local
Year District
attributes ID_store
Area
Units_sold Num_tellers
Product Purchase_cost
• Typically comprise a small number of records ID_product
Sale_value
Num_Clients
(when compared to the facts table) Name
Type
Brand
Category
Pack
Description

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 33 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 34

Low level queries User interfaces


Time
ID_time
Day
• Explore data in Data Warehouses
Store
Day_of_week
Week_of_year Sales
ID_store – Typical OLAP tools
Month Name
Trimester ID_time Local • Access the relational engine using SQL
Year ID_product
District
ID_store
Area • Data presentation using tables, graphics, reports, etc
Units_sold Num_tellers
Product Purchase_cost • Targeted for ad-hoc queries
Sale_value
ID_product Num_Clients
Name – Other tools
Type
Brand
select avg (sale_value x units_sold) • Data mining
Category
Pack from sale, time, product • Modeling
Description
where JOIN_TABLES
group by brand, month

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 35 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 36

DEPEND 2009 6
Marco Vieira, University of Coimbra, Portugal

Queries - Slice and Dice Drill-Down & Roll-Up

Drill-Down Roll-up
Most generic category
Sales by time and
product Sales by store and
brand
Intermediate category

Most detailed category

Full Detail

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 37 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 38

Steps for the design of the star


Time: Drill-Down & Roll-Up
model
Drill-Down ALL Roll-up 1. Identify the business process/activity
Year 2. Identify the facts el
od le c)
e m ilab s, et
h a e
Trimester 3. Identify the dimensions t t av fil
ha ta s,
e t t da ase
g e b
or th ta
Month 4. Define the data granularity t f on da
no ds nal
o n
D pe atio
• Day, Week, Month, … de per
(o
Week • Product, Category, …
• Store, City, …
Day

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 39 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 40

Example – Retail sales Retail sales – Business data

• Set of stores belonging to the same enterprise • Where to collect the data?
– POS - point of sales
• Goal: Analysis of sales
– Operational database
• Each store has several departments (food, • What to measure?
hygiene and cleaning, etc) – Sales
• Sells thousands of products • Goals?
– Maximize the profit
• Products are identified using a unique number – Maximum sales price possible
– Lower costs – More clients

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 41 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 42

DEPEND 2009 7
Marco Vieira, University of Coimbra, Portugal

Retail sales – Facts Retail sales – Dimensions

• Examples of relevant decision support facts: • Main dimensions:


– Number of units sold – Product x Store x Time
– Acquisition costs
• Are there other relevant dimensions?
– Sale value
– Supplier? – Promotions? – Client?
– Number of clients that bought the product
– Employee responsible for the store on that day?
• Question: is it possible to obtain base data
• It is normally possible to add extra dimensions
(from the operational system) for these facts?
• All the dimensions have a 1:M relationship
with the facts
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 43 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 44

Retail sales Granularity

ID_product
ID_product • Example: record the daily sales for all products
ID_time
description
full_description
ID_store
ID_promotion
ID_store
name
– Analyze in detail (price, quantity, etc) the products
SKU_number
package_size units_sold store_number
store_street_address
sold every day, in each store, …
brand purchase_cost city
subcategory
category
sale_value
num_Clients
store_county
store_state • Retail sales granularity:
department store_zip
package_type
diet_type
ID_time sales_district
sales_region
– Products x Store x Promotion x Day
weight date ID_promotion
store_manager
weight_unit_of_measure
units_per_retail_case
units_per_shipping_case
day_of_week
day_number_in_month
day_number_overall
number
name
type_price_red
store_phone
store_FAX • The granularity defines the detail of the DW and
floor_plan_type
cases_per_pallet
shelf_width_cm
week_number_in_year
week_number_overall
type_advertisement
type_poster
photo_processing_type
finance_services_type
has a strong impact in the size
shelf_height_cm Month Type_coupons first_opened_date
shelf_depth_cm
……...
quarter
fiscal_period
promotion_cost
start_date
last_remodel_date
store_sqft • The granularity must be adjusted to the
year end_date grocery_sqft
holiday_flag
……….
……... frozen_sqft
meat_sqft
analysis requirements
……...
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 45 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 46

Retail sales – Details Retail sales – Details

ID_product
• Mandatory dimension that
ID_product
ID_product
ID_product
ID_time ID_time
description ID_store represents the DW temporal
ID_store description ID_store ID_store
full_description
SKU_number
ID_promotion
dependency name full_description
SKU_number
• Must characterize
ID_promotion the products name
store_number store_number
package_size units_sold
purchase_cost
store_street_address package_size as seenunits_sold
by the business
purchase_cost
store_street_address
brand
subcategory
• Must describe time city
sale_value
as seen by
store_county
brand
subcategory management
sale_value
city
store_county
category
department
the business management
num_Clients store_state category
department
num_Clients store_state
package_type
store_zip
sales_district package_type • Must contain the attributes that store_zip
sales_district
diet_type
ID_time
• IsID_promotion
typically generated in a
sales_region diet_type
ID_time
are relevant forID_promotion
posterior queries sales_region
weight date weight date
weight_unit_of_measure day_of_week synthetic
number manner store_manager
store_phone weight_unit_of_measure day_of_week number
store_manager
store_phone
units_per_retail_case
units_per_shipping_case
day_number_in_month
day_number_overall
name store_FAX units_per_retail_case • It is a strongly denormalized
day_number_in_month
day_number_overall
name store_FAX

cases_per_pallet week_number_in_year • It type_price_red


is not generated from
type_advertisement the
floor_plan_type
photo_processing_type
units_per_shipping_case
cases_per_pallet week_number_in_year
type_price_red
table (which is also typical in
type_advertisement
floor_plan_type
photo_processing_type
week_number_overall week_number_overall
shelf_width_cm
shelf_height_cm Month operational
type_poster
Type_coupons
databases
finance_services_type
first_opened_date
shelf_width_cm
shelf_height_cm Monthother dimensions)
type_poster
Type_coupons
finance_services_type
first_opened_date
shelf_depth_cm quarter promotion_cost last_remodel_date shelf_depth_cm quarter promotion_cost last_remodel_date
……... fiscal_period
year
• Includes
start_date all the records
end_date
store_sqft ……... fiscal_period
year
start_date
end_date
store_sqft
grocery_sqft grocery_sqft
holiday_flag representing
……... the time period
frozen_sqft holiday_flag ……... frozen_sqft
………. meat_sqft ………. meat_sqft
considered in the DW ……... ……...
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 47 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 48

DEPEND 2009 8
Marco Vieira, University of Coimbra, Portugal

Retail sales – Details Retail sales – Details

ID_product
ID_product •ID_product
Characterizes the existing promotions
ID_product
• Must characterize the stores as
ID_time
ID_store
ID_time
ID_store
description
full_description
ID_store
seen by the business management
ID_promotion name
•description
In this example there is only ID_store
full_description one dimension related to name
ID_promotion
promotions
SKU_number store_number SKU_number store_number
package_size units_sold •package_size
Represents a very importantunits_sold
dimension
brand • Must contain the attributes
purchase_costthat are
store_street_address
city brand purchase_cost
store_street_address
subcategory sale_value • Managers want to know the impact
subcategory of promotions in the salescity
sale_value in order to
category relevant for posterior queries
num_Clients
store_county
store_state categorytarget new promotions to specificnum_Clients
store_county
products, stores and time store_state
department store_zip department store_zip
package_type
diet_type
• Includes
ID_time geographical attributes sales_district package_type
diet_type
ID_time sales_district
sales_region sales_region
weight
weight_unit_of_measure
(localization)
date
day_of_week
ID_promotion
number
store_manager weight
weight_unit_of_measure
date
day_of_week
ID_promotion
number
store_manager
store_phone store_phone
units_per_retail_case day_number_in_month name store_FAX units_per_retail_case day_number_in_month name store_FAX
• Includes
units_per_shipping_case time attributestype_price_red
day_number_overall floor_plan_type units_per_shipping_case day_number_overall type_price_red floor_plan_type
cases_per_pallet week_number_in_year type_advertisement photo_processing_type cases_per_pallet week_number_in_year type_advertisement photo_processing_type
shelf_width_cm (opening date,…).
week_number_overall type_poster finance_services_type shelf_width_cm week_number_overall type_poster finance_services_type
shelf_height_cm Month Type_coupons first_opened_date shelf_height_cm Month Type_coupons first_opened_date
shelf_depth_cm quarter promotion_cost last_remodel_date shelf_depth_cm quarter promotion_cost last_remodel_date
……... fiscal_period start_date store_sqft ……... fiscal_period start_date store_sqft
year end_date grocery_sqft year end_date grocery_sqft
holiday_flag ……... frozen_sqft holiday_flag ……... frozen_sqft
………. meat_sqft ………. meat_sqft
……... ……...
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 49 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 50

More than one star Several stars


Sales Stock
ID_time
ID_product Time ID_time
Store ID_product
ID_store ID_warehouse Warehouse
units_sold quant_available
purchase_cost quant_out Sales
sale_value
num_clients
purchase_cost
last_sell_price
Orders Dimension: Time
Product Dimension: Time Dimension: Component
Dimension: Component Dimension: Client
• Two or more starts can be connected using one Dimension: Supplier Dimension: Contract
Dimension: Contract
or more dimensions
Stocks
• Shared dimensions must be conform Dimension: Time
Dimension: Component
– Contain consistent data when considering each star Dimension: Warehouse

• Drill across: query that crosses more than one


start
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 51 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 52

Questions

? 3. Using DW to analyze
dependability data

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 53

DEPEND 2009 9
Marco Vieira, University of Coimbra, Portugal

Basic elements of a DW A DW for experimental data


Operations Analysis Experiments Result analysis
Operational DB Fault injection
Multidimensional OLAP application
Multidimensional OLAP application tools
server (result analysis)
server (result analysis)
Robustness testing
Exp. System A
Legacy Systems tools
Ad hoc
Ad hoc queries
queries Dependability
Data benchmarking Data LAN/
Spread sheets, Warehouse Warehouse
Internet Statistical
files ... Net Statistical experiments
Exp. System B Reporting
Reporting
Any other
experimental
External sources environment

Field
Exp. dataN
System

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 55 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 56

Two types of data in experimental


Key points of the proposed approach dependability evaluation
•Experiments •Multidimensional •OLAP tool
database Network
•Exp. Setup A •Ad hoc
queries

•Exp. Setup B Experiment

?
Management System Exp. control data
Data Faults definition Target System
Warehouse •Net

•Statistical Readouts
Exp. Setup N •Reporting
(impact of faults)
Two types of data:
• General approach to store results from dependability
evaluation experiments • Measures collected from the target system (FACTS)
– For example, raw data representing error detection efficiency, recovery
• Data from different experiments can be compared/cross- time, failure modes, etc
exploit (only if it makes What’s
sense to compare)
inside? • Features of the target system and experimental setup
• Raw data is available (not only the final results)
that have impact on the measures (DIMENSIONS)
• Results can be analyzed and shared world wide by using – For example, attributes describing the target systems, the different
web-enabled versions of OLAP tools configurations, the workload, the faultload, etc
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 57 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 58

The multidimensional model The star schema

• Facts are stored in a multidimensional array


• Dimensions are used to access the array
according to any possible criteria
m
te
ys

System B
ts
ge

System A
r
Ta

Faultload

Workload

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 59 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 60

DEPEND 2009 10
Marco Vieira, University of Coimbra, Portugal

Basic elements of the proposed Basic elements of the proposed


approach approach
•Experiment •Multidimensional •Analysis •Experiment •Multidimensional •Analysis
s database s database
•Exp. Setup A •Ad hoc •Exp. Setup A •Ad hoc
queries queries

•Exp. Setup B •Exp. Setup B


Data •Loading Data
Warehouse •Net Warehouse •Net
applications
•Statistical •Statistical
Exp. Setup N •Reporting Exp. Setup N •Reporting

The experimental setups are used as they are. You can use your Loading applications
favorite dependability evaluation tool and do the experiments • General purpose loading applications
in the usual way. It’s necessary…
• Some transformations in the data are normally necessary for
• To know the format of the raw results consistency
• To have access to the results
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 61 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 62

Basic elements of the proposed Basic elements of the proposed


approach approach
•Experiment •Multidimensional •Analysis •Experiment •Multidimensional •Analysis
s database s database
•Exp. Setup A •Ad hoc •Exp. Setup A •Ad hoc
queries queries

•Exp. Setup B •Exp. Setup B

•Loading Data •Loading Data


Warehouse •Net Warehouse •Net
applications applications
•Statistical •Statistical
Exp. Setup N •Reporting Exp. Setup N •Reporting

Data warehouse
Analysis
• Raw data is available in a standard star schema (facts + dimensions)
• Results from different experiments are compatible and can be compared/ • Commercial OLAP tools are used to analyze the raw data and
analyzed together, then they are stored in the same star schema (or in compute the measures. These tools are designed to be used by
scheme that share at least one dimension) managers: very easy to use :-)
• If results are from different unrelated experiments then they are stored in a • Just need an internet browser to analyze the data
separated schema

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 63 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 64

Steps needed to put our approach Example: Recovery and


into practice Performance Evaluation in DBMS
1. Definition of the adequate star schema to store the • Tuning of a large DBMS is very complex
data. Create the tables in the data warehouse
2. Use general-purpose loading application to define • Administrators tend to focus on performance
the loading plans for each table in the star schema tuning and disregard the recovery features
3. Run the loading plans to load the star tables with
the raw data collected from the experiments
• Administrators seldom have feedback on how
good a given configuration is
4. Every time a new experiment is done
corresponding loading plans are run again to add • A technique to characterize the performance
the new data to the data warehouse
and the recoverability in DBMS is needed
5. Analyze the data: calculate measures, find
unexpected results, analyze trends, etc
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 65 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 66

DEPEND 2009 11
Marco Vieira, University of Coimbra, Portugal

Operator faults injection and


The Approach
recovery
• Extending existing performance benchmarks
to evaluate recoverability features in DBMS
• Include a faultload and new measures
• Faultload based on operator faults
• Measures related to recovery:
– Recovery time
– Data integrity violations
– Lost transactions

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 67 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 68

Experimental setup The data storage model

Test

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 69 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 70

Definition of the adequate star


Steps towards data analyzes
schema: Identify the process/activity
1. Definition of the adequate star schema • Experiments to characterize the performance
a. Identify the process/activity and the recoverability in DBMS
b. Identify the facts
• Includes a faultload and new measures
c. Identify the dimensions
d. Define the data granularity • Faultload based on operator faults
2. Load the data • Measures related to recovery
3. Analyze the data

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 71 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 72

DEPEND 2009 12
Marco Vieira, University of Coimbra, Portugal

Definition of the adequate star Definition of the adequate star


schema: Identify the facts schema: Identify the dimensions

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 73 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 74

Definition of the adequate star


The star schema
schema: Define the data granularity
• Performance and recovery results
– Per experiment
– Per SUT
– Per workload
– Per fault type

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 75 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 76

Analyze the data: Example of query


Load the data
construction

ETL

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 77 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 78

DEPEND 2009 13
Marco Vieira, University of Coimbra, Portugal

Analyze the data: Example of query


Questions
answer

?
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 79 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 80

AMBER Repository vision and


objectives

• Vision
− Become a worldwide repository for
dependability related data

4. The AMBER Data • Key objectives:


− Provide state-of-the-art data analysis

Repository − Allow data comparison and cross-exploitation


− Facilitate worldwide data sharing and dissemination

• Potential tool to increase the impact of research

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 82

Potential use Data analysis approach

• Research team level • Repository to analyze, compare, and share results


− Perform the analysis of data in an efficient way
• Use a business intelligence approach:
− Efficient dissemination of the results of the team
− Data warehouse to store data
• Project level − On-Line Analytical Processing (OLAP) to analyze data
− Sharing and cross-exploitation of results from different − Data mining algorithms to identify (unknown) phenomena in
project teams the data
• World wide − Information retrieval to access data in textual formats
− Common repository to store and share data
• Adopt the same life cycle of BI data
− Many teams are performing dependability evaluation but
there are no results available at the web • Use technology already available for DW, DM & IR
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 83 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 84

DEPEND 2009 14
Marco Vieira, University of Coimbra, Portugal

Steps User registration

1. User registration • ADR users must undergo a registration procedure


2. Multidimensional analysis • Provide identification information that is verified by
the ADR support team
3. Definition of the loading plans
− To filter malicious users
7. Load the data
• Contact information is used to get in touch with the
8. Definition of data ownership policies potential repository user
9. Analysis of the data • To access the repository users must authenticate
• Analyze DBench-OLTP results using OLAP
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 85 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 86

Multidimensional analysis The DBench-OLTP benchmark

• Design an adequate multidimensional data model


• User has the required expertise to design the data
model 
− Send to the ADR support team the SQL scripts needed to
create the database tables

• The ADR team helps the user defining the model


− The user only needs to explain us the experimental setup
and the format of the data collected

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 87 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 88

Format of the raw data Data model (1)

• Raw data collected by DBench-OLTP is composed of • Key steps:


tens of CSV files (one from each run) − Identification of the facts that characterize the problem
under analysis
• Each row contains data from an injection slot − Identification of the dimensions that may influence the facts
− Identification, duration, number of transactions executed, − Definition of the granularity of the data stored in the star
data integrity errors discovered, type of fault injected, schema
moment of fault injection, workload used, etc)

• A text file describes the experiment and the


characteristics of the SUB

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 89 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 90

DEPEND 2009 15
Marco Vieira, University of Coimbra, Portugal

Data model (2) Definition of the loading plans

• Data extraction
− SQL scripts to extract data from the CSV files to a temporary
database schema (data staging area)

• Data transformation
− SQL scripts transform the data into an adequate format

• Data load
− SQL scripts to load the transformed data into the data
warehouse

• Loading plans documented and stored in the ADR


DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 91 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 92

Load the data Data ownership policy

• Executing the loading plans created before • Data ownership policies of ADR are divided in two main
groups
• If new data becomes available we just need to rerun
− Private data
the plans
− Proprietary data
− e.g., if the benchmark is executed in other systems
− Collaborative data
• The documentation of the DBench-OLTP includes
• For the DBench-OLTP data we have decided to use a
papers and technical reports
collaborative approach
− This is considered as part of the DBench-OLTP data
− Allows other potential users of the benchmark to compare
− It is loaded to the repository and made available to the their results with the ones available in the ADR
potential readers of the data

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 93 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 94

Analysis of the data OLAP Wizard

• On-line Analytical Processing (OLAP) tools • Selection of query type (crosstab or table) and
− Support the analysis in a very flexible way characteristics (title, graph, text area, etc)
− Provide high query performance and easy, intuitive data
navigation • Selection of measures and dimensional attributes

• Oracle Business Intelligence Discoverer Plus (ODP) • Setting the query layout
− Commercial tool included in Oracle Business Intelligence • Selection of the fields to be used to sort the results
package
− Widely used by industry Used freely for research purposes • Creation of parameters used to filter data
under an Oracle Academy Agreement

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 95 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 96

DEPEND 2009 16
Marco Vieira, University of Coimbra, Portugal

Some results Quick demo…

• Murphy's law… 

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 97 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 98

http://www.amber-project.eu Questions

Do you have data?


Share Them!
?
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 99 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 100

Generic bibliography ADR bibliography

• Ralph Kimbal, Margy Ross, “The Data • Madeira, H., Costa, J., Vieira, M. , "The OLAP and Data
Warehousing Approaches for Analysis and Sharing of Results
Warehouse Toolkit: The Complete Guide to from Dependability Evaluation Experiments", International
Dimensional Modeling” (Second Edition), Ed. Conference on Dependable Systems and Networks, DSN-
J. Wiley & Sons, Inc, 2002. DCC 2003, San Francisco, CA, USA, June 2003
• Pintér, G., Madeira, H., Vieira, M., Pataricza, A., Majzik, I. , "A
• Ralph Kimbal, “The Data Warehouse Lifecycle Data Mining Approach to Identify Key Factors in Dependability
Toolkit”, Ed. J. Wiley & Sons, Inc, 2001. Experiments", Fifth European Dependable Computing
Conference (EDCC-5), Budapest, Hungary, April 2005

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 101 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 102

DEPEND 2009 17
Marco Vieira, University of Coimbra, Portugal

ADR bibliography

• Pintér, G., Madeira, H., Vieira, M., Majzik, I., Pataricza, A. ,


"Integration of OLAP and Data Mining for Analysis of Results
from Dependability Evaluation Experiments", International
Journal of Knowledge Management Studies (IJKMS), Volume
2 – Issue 4 – 2008, Inderscience Publishers, July 2008
• Vieira, M., Mendes, N., Durães, J., Madeira, H. , "The
AMBER Data Repository", DSN 2008 Workshop on
Resilience Assessment and Dependability Benchmarking
(DSN-RADB08), Anchorage, Alaska, June 2008
• Vieira, M., Mendes, N., Durães, J. , "A Case Study on Using
the AMBER Data Repository for Experimental Data Analysis",
SRDS 2008 Workshop on Sharing Field Data and Experiment
Measurements on Resilience of Distributed Computing
Systems, Naples, Italy, October 2008
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 103

DEPEND 2009 18

You might also like