EVERYTHING
YOU NEED TO KNOW
ABOUT :
OLAP & DATA MINING
part 2
workearly
Introduction.
Olap: A list of Benefits.
Olap languages.
Olap Operations.
Data mining popular functions.
Steps to prevent an error-prone data
mining process.
Olap and Data Mining side by side.
workearly.gr
INTRODUCTION
OLAP ensures that users can perform calculations on typically small amounts of
information and it provides the essential insights based on the data accorded by
various sources and databases. Its main feature of OLAP is the ability to represent data
and items relations in a multidimensional structure to be displayed in the form of the
cubes.
Through this feature, a user can view data from various perspectives and can carry out
holistic analysis through receiving responses to ad-hoc queries.
OLAP Queries are run very fast even given large volumes of data owing to precomputed
aggregations.
OLAP and Data Mining allow users to handle and analyze big amount of data and from
various angles.
OLAP: A LIST OF BENEFITS.
1) Structured: OLAP represents all data in a multidimensional structured view which
ensures various perspectives from which the information is available.
2) Accessibility: There’s a high-security level.
3) Rapid: Multi-user client or server architecture that delivers fast query responses.
4) Interactive: It can maintain all business data in a form accessible to be conveniently
visualized through OLAP dashboards.
5) OLAP cube: It can be managed by any business employee and doesn’t need any
additional programming or analytical skills.
6) SQL: OLAP is available to fully embrace the potential of the SQL extensions,
statements, and functions.
workearly.gr
OLAP LANGUAGES
Two OLAP languages:
1) SQL which stands for Structured Query Language. A computer language developed to
work in two dimensions to manage relational databases and manipulate data.
2) MDX which stands for Multidimensional expressions is a language for analytical
queries expression. MDX can reference multiple dimensions.
OLAP operations using SQL and MDX languages are pretty similar.
OLAP OPERATIONS:
1) Drill Up
Drill-up is an operation to gather data from the cube either by ascending a concept
hierarchy for a dimension or by dimension reduction to receive measures at a less
detailed granularity.
- A user has to group columns and unite the values to see a broader perspective in
compliance with the concept hierarchy. With this operation, one or more dimensions
from the data cube will be deleted. In some sources drill up and roll up operations in
OLAP come as synonyms, so this variant is also possible.
2) Drill down
As we have explained before this operation is the opposite to Drill-up. It is carried out
either by descending a concept hierarchy for a dimension or by adding a new
dimension. It lets a user deploy highly detailed data from a less detailed cube.
- One or more dimensions from the data cube must be appended to provide more
information elements.
workearly.gr
3) Slice
This operation takes one specific dimension from a cube given and represents a
new sub-cube, which provides information from another point of view.
- It can create a new sub-cube by choosing one or more dimensions.
4) Dice
This operation emphasizes two or more dimensions from a cube given and suggests
a new sub-cube just like the Slice operation.
-To locate a single value for a cube, it includes adding values for each dimension.
5) Pivot
This operation rotates the axes of a cube to provide an alternative view of the data
cube. Pivot clusters the data with other dimensions which helps analyze the
performance of a company or enterprise.
6) Scoping
Scoping restrains the presentation of the database objects to a specified subset.
Users can receive and update certain data values. If there is a huge amount of data
and a user needs to constrain the access of information to a specified subset
Scoping is mostly conducive.
7) Screening
This limits the set of data extracted.
8) Drill across
This operation reconciles cells from several data cubes that share the same scheme.
9) Drill-through
This operation enables users to navigate from data at the lower level in a cube to data in
the operational systems whence the cube was ejected. The operation is usually
exploited to identify the cause of outlier values in a data cube.
workearly.gr
10) Sort
This operation brings the cube back where the members of a dimension were sorted.
11) Add Measure
Users can add new measures to a cube.
12) Drop Measure
Users can drop a measure from a data cube.
13) Union
Users can unite various cubes that have the same scheme but separate instances.
14) Difference
This operation eliminates the cells in a cube that are owned by another one. The cubes
must possess the same scheme.
DATA MINING POPULAR FUNCTIONS:
Attribute Importance: To foresee a target attribute this function defines and
graduates the most significant attributes.
Association: This examines the market to find out which items are typically bought
together.
Classification: Groups items for further understanding of how to classify a new item.
Clustering: This searches for frequently used native data classes of grouped items
to determine customer segments.
workearly.gr
Feature Extraction: This function makes it possible to generate derived meaningful
features leading to data redundancy reduction.
Regression: it helps foresee and set approximate results which may occur in the
future.
Outlier Detection: Identifies aberrations in the system.
STEPS TO PREVENT AN ERROR-PRONE DATA MINING PROCESS
1) Big Databases
It is better to conduct data mining in a native database system.
2) Diverse Databases
Efficient data processing is provided by multiple databases that have to be supported
by the data warehouse utilized.
3) Relational or Complex Types of Data
Data mining technologies may involve clustering, classification, association,
characterization, and prediction.
Tight integration assists in rapid interactive mining using the tools below:
• Aggregate queries to examine graph databases.
• Data visualization through an OLAP dashboard.
• Meta-rule guided mining.
• Statistical analysis in an OLAP multidimensional database.
• Sub-graph histogram representation for classification of images.
workearly.gr
4) Data Gaps
When there are gaps in data a user should take additional measures to solve the
problem:
• Independent component analysis and self-organizing maps - ICA and SOM manage
data thatthat involve gaps by assessing lacking information through the given data.
• Parametric and non-parametric methods of imputation develop strategies built.
• Multi-task learning develops pattern classification with missing inputs.
5) Strong Performance
Parallel data mining application makes it easier to adopt support vector machines, tune
scalable data mining, provide scalable and parallel data mining algorithms
performance.
OLAP AND DATA MINING SIDE BY SIDE:
Users:
They have different users. OLAP is designed for average employees, data mining is
utilized by business statisticians and strategists possessing professional skills.
Used Individually:
-Data mining and OLAP may operate separately.
-OLAP enhances the overall productivity of a business.
-Data mining is ideal for users interested in their further perspectives.
- Data mining tools are designed for specialists with particular skills.
OLAP is easily understood and is often sufficient for those who need only reporting and
multidimensional analysis.
workearly.gr
Advanced Analytics:
Data mining provides advanced analytics for the detection of objects that are
commonly bought together. It can also identify the demographics.
With OLAP a user can conduct in-depth data analysis to figure out current trends
concerning demands and use the occasion to act.
Supplement:
They can reinforce each other.
If OLAP finds out general issues data mining tools can facilitate the process of analyzing
more detailed information regarding particular clients.
When OLAP monitors and tracks the results, data mining will predict future income and
its increase based on the given data. As it is seen, operating together the systems can
bring up more substantive insights.
workearly.gr