0 ratings0% found this document useful (0 votes) 37 views86 pagesData Mining
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Data mein * @
Shorts
dy whats cata Cleadiing 2 Explain. Ga cletoll-
> Data Clearfing Ts “the Process Of Correcting or
Yeteting thaccurate » Chmagedl, PPYOPEN
formatted, Aupttcated or! Insuritcrent dato
from a dataset. Even Mf results and at orithins,
Qppear tO be Correct 5 they are: unre (fable vite
the lata? Ts’ Traceurates There are NUMEFESS
ways for clata to be lupticated or trcorrectly
labeled wher. merging multiple cata sources.
Steps —for Clearsing Data
L
Step 13 ‘Remove | duplicate or Yrretevant Obse evations,
| Stepa: Fx Structural errors
} Step3y. Filter Unwanted Outiters ,
| Step as -tHandle missing cata
Step'S% validate and GA»
Techfques -for Clearing. Data
% Tgnore the -luptes 9 % Regt essibn.
* fine
x Fittthe missing value Cluse idpw liStns thy ns,
diirnasiiscrsasate lel
Characteristics of Data Cleaning.
———— ees
* Accuracy. Soe uniformity
* Coherence # Data verification,
W9, .
* Nolet * Clean Data BacKfinus
Benefits of Data Clearsing
When you have ‘Clean cata, You can, rane Set Stone
Using the Iigiese — Quatity. information. and
Eventuadiy. boost Product vite aloe Fotiowstng oe
Some Sienportant- odivantages OF Cato Clearing Gn
lata minfng, Wncluding +
gr mettng
# Removal of Inaccuracies when Several Gata
Sources are Tnvolved:
a Cllents are happier and employees are (ess
Annoyed when there are fewer mistakes:
x The Capactty, to map out the many functions
and the planned uses of aoe alata.
* mating. decisions more once ond with
{ x ith the
eres pecctancd wit, be Ppessible with
use of cata eangea LooseGR
2
2, List out the Components of Cate worehasor9
Data warehouse TS Useg 49 store Historical Cem
hich helps to Make Strategie Ckcistoms -for tne
' business. Te is useg ~for antine Analy vical
Prog (OLAP) Which helps to analyze the lata.
The ata warehoure Contributes tO business
executives In Systematically, orgonizing» acce pting >
and using. ther Clata. to ‘Pmawe Stracegic decisions -,
=, tgp cat data Warehouse has four main Componentss
QO Central Gatabase, ETL (extract transform, toad)
f Eoots, metadata » and access tools. AW OF the se
Component are engineered for Speed So that you
Can ach results quicktg. and analyse data onthe
% Central database + A chtabase Serves oF PE:
foun dation of your clata warehouse * Tractiti onally 5
relationag Catabascs
| Sit:
|
r these have been Standard
veniog. On. premise or Yn the cious:
% “Data Tnte rations Data's pulled From Source
“Sgtems and modified to align tne Yneformation
_ fer rapid anatyti cal, Cons Unipt tort using & variety.
Of cata ‘integration. apprraches Such 03 ET
|
L asxcel
ada ee
cata TEP" Cation, bulk-Load
as reat- time
cata EraP*formmat ion, and data
Proc essing .
Buality ard enrich Mere Services.
|e metaciata y metadata Ts clata about your clata-
Te Specifies “the Sources Usage , values , and otner
e fate Sets Th your Clata usarehouse-
features of th
There Ts business Mmetedata, which adds context
to your clata, and techrScal metadata , which
Aescribes how to acceSS Cata- nctudling where
6
T resides and how THIS Structured.
or
%#& Data warehouse access tools 3 ACCESS toolSallo
E 6 ata
users +o ‘interact wtth-the cata 1 your d
Worehouge- Examples of access teats MWciude®
tools, application. Cevetopment”
Beery. and reporting.
tools , cata Pointing. feats, and OLAP tools.
3, List out the OLA operations.
OLAP, which Stands -for online Analytical Reng
Operations are a Set. OF -techMgues Used Tn:
lata mining, and, business: Ante itigence Fer onalyzing
OcAP Operations Enable users to
t
barge ata sets:
extract. Insights from motidimenstonal cata Sets2 utekiy
HaHa Cx een ate
d Hictent( these Opcrattons helPe@
users e6
xPlore cata ~fiy
iple ives
gon. Insights m multiple perspectives,
a
ID)
to ata retationships, and make
9
IMfoy,
Med ects;
A cecisions besed.on, Gata analysis.
Dne
‘ ©xam x .
on PLE OF OM oem, operation TS the “Stee”
cratfon, F
ee This OPeration allows users to extract
Sat fl
Oma MOKTdimengional Cube by selecting
‘
Sin .
a 3 Aimensign | and a. Specific valite +for-tnat:
'Mension .
o ; a
Faretions Yo. oune
“Deh
Down,
ti
The bis
FIR Down, OLAP Operation anows Users EO VieLd
™ otof
Ore Cletaiifed Gata by expanding. a parefewlar
Aimensfon Wa MUU Imen Sonal Cube. Forcxample,
O User Can AriLe Clovon Tnto the Product dimension.
€ View data for Thafividual products, ora User
Can expand Guarterly Sales, cata Into monthly.
Sates figures .
Dr up
Tt isthe Opposite Operation of DrLt Down. the
DW UP OLAP operation allows users to view
Seta ata higher levee of aggregator bg
Cottapsing 2 Spee Cfimensfon Jn. a: multidioensend
Cute:|. ee a «yp
slice
—
eattO™ Allows Use . ne
athe slice OLAP ° S users Lo Extract
a From. pavttdirensiional Cube by selecting.
lata ~Trrom.
Single iene nsio® and a Spectfie values for that
a ne e
Aimension:
Dice
Lap opevatio® AMows users CO extract
mensional Cube by selecting.
“The Dice oO
cata froma mvttta?
moiitple imensions and SpecSfte values for each
Selected Cfimensen:
Pivot
The Pivot OLAP operation. altows Users to, Oo
onal, Cube FO
tate
eye a
the orventatfon of a moulliairnen si
WReéus the clata —From.o- Atfferene perspective.
Soong
1 ARMM ae
he Scoping. oLAp Operation tovoly
oa spectfied subs ce
23 restricting.
the clatabase view t
Screerfing.
he Sctettod OLAP operation ‘tvolve s restiching
che Set Of cata retriwed against the data
Or members Of A Cimenson.
- Dru wm 4
Be mee Orill throu gh, Sort Add measure , Drop
are, Unton., olifference4. Defihe lata Culse 2 te pide cletatl @
A COO Cube refers to aquitelimenstonal tata
Structure. That fs, aig within “the data Cube
'S expidined by Specitig cfimensional volarer +
“Data . .
MO Cube Classification
“THE clata Cube Can be classified nto EWO
Categories s
* Muiticimenstonal Chia Cube: Tt basically. helps
Gn. Storing large Amounts of cata by making —
Ce Of (a muLt- Aimensional array Te (ncreares
9
tes efficiency by Keeping! ar_inde x or each_
Aimensions thus, “mensional TS able to retrfeve
Gata —-fost.
%* Relational clatia Cubes Te basically helps Mn
Stoving. barge amounts of data by maring use of
relational Lables. Each Fetationag table = dispoys
the Aimensfons of the data Cube- tte IS Stower
Compared to a moltidimensional ata Cube.
“Data Cube operations
There are mainly ® Opcrations listed betow-—=~ —
se Rolt- UP? operation and 999
Smilar cata attributes having.
a
regate Certain
tine Sane
ASmension. tog ether.
2 OFtU~clowon 4 This Operation I the reverse OF |
the relt- Up operation. Pr alows “* +o tone }
Particular Information. and ther subdivide '
further for Coarser granularity analysts:
TCf ar
2 ae ae This operation Fitters the unnecess Or
Porttons.
* Dicing 3 srthis 5 operation cloes a rnultidimemional
Cutting » that not only CUES only one dimension
' t ¢
ut also Can go to Another dimension and Cut @
Certatn range of %
* pNot a
Pp is operation @s very mportant from
a
Viewing an of View: Tt basically. transforms :
the data Cube Yn fepps Of view.
“Ad vantages
*
Mult? im ensional ao:
* Anteractivity.
* Speed and efficie
“Disechantagss de
Hm Compiexh
* “Data size Ine tation
ie 2% Performance iSsciey“4
oe \
i ee 5: lain In cletail- 8
5, Define Star SChemaq 2 ExP
form _ of 2- 4
A Star Schema Tsthe ejernentory Form
o
Simension al n Modet, Ta ughich alata. ore g .
an even
Tato Facts ang Aimensions. A Fact ts
anized
a sale
that Ys Counted Or measured, Such as
Or log 4 cs ’ ce clato.
Og'h A Aimension Sncludes .referen
omere
About the fact, such. ag cate, Item, or Cust
im enaior
table
Star Schema
Character?stics of Star Schema
maereiee oa oe CCRC
The Stor Schema 7s “ntensety. Suitable for
Gata Warehouse catabase clesign because oF
| the Foltousing. features %
| %® Te Creates a De- normaltzed clatabase that
Can: akc provide qe responses,r.
ae Te pr ovides
rexible eso
ga Fe estan that Can. be
ie Be
Changed easily OF aed ty throughour the (4!
and as +
Clevelopment Cycles Sthe Aatabose gros
9
A Te provides & porate In estan. to hou
a *
end- users typi thine of and Ure the cata ce
ial
oe he reduices “the Complextey Of metadata -For
bot clevetopers and end-users. a
Palvantages |
* Querg performance
*® Load performance and adesinitstratton = —-~+
% Bee-_ referential Tote era Ss
* Eosity understood
Tisad vanta f
sone qe A
es ae reclunciancy,.
9
Uinited cfimension. Aepth .+
5
dato Tnonsi-sten “4
Ri gidtty
*
*
mR
y@\
Essays
l, DI FferentTate OLE and oLAP orth features: 3
* OLAP Stands for ONIine ~naly etcal proces sing:
OLAP Systems have the Copability +0 analyze
ARABS ;
ce Information. of multiple systems at the
ot Of OLAP service
°
Current time. The Primary ae
Moen analysts and noe data proce So ra]
9
# orp Stands -for onttne Transaction proces sng:
oLrtp has the Work tO Ad min ?ster clay- to- day
the moire goot
a
transactions In Ong organization:
ofs-
of oLTe ts data Processing. not Gara analy
Features Of OLAP ang OLTP
—_—————
| # Data tyre
OLTP uses real-time and transactional catafrom
2 etna Source ; While OLAP USES mistorical and
ogaregored tata From moutetple sources:
|
%*% Database
OLTP Uses a relational clatabase that Can fhandte
Mouitiple Concurrent transacttons » wonile OLAP
uses a data coarenouse that ConsoidaresS
multiple fata Sources:
% Data view
OTP focuses, on Currentydata, whitie OLAP
Generates and validates insights from Aata
Compiled over time:
ele a ESR
ee ee |PYerpose
OuTp Ts designed
yw
real-time transaction»
te
entry » whil
| Of E
asting >
and proce s3°0F-
online pane’ g ?
oLar t= deat md
° sypRpere
planniing-
4
for ra0F
Such as
Shopping , and order
e volume
For analyzing larg)
rec
Aect ston - reaxing!s suth as “fo
and bucigeting.
Users
re frontt ine
edt for USE a
lers, OF
na pank rel
ctons+
data
OLTP Systems are design
workers IIke! Cashiers O
ico
service OPP"!
ned for use by
£35 and knowledge
for! Customer Sette
OLAP Systems Oe aestg
Scfentists, business analys
coorKerss
and. OLTP
Dtfference between OLF
oraP@nine “orTPCOnline—
Ca Eegory- naigeical processing) Trangaction. precessing)
TeTs welk-Known a8 | Te Ts wells KNOL
Qn online Gatabase | o8 an ontine
Definition a
Querg. management database rood if ging
System. Sgrem.
ee Consists -Of Consists Of
historical aata a
Source only operationag
From: Vari clatabas
ious 3}
; Current data,method
Te makes ‘Use of
_ Gy
Te raakes use OF,
A Standard cratabose
used O Sata Waorehgise management sytem. Q
| aaa - patent
' Application, TERS Subject-oriened| Lets applicart
re Us@a -for Clatanfining | orfented - used for
Maly ties, lectsipns busthess tks:
maxing etc.
Ty Clatabase,
Lr an OL-AP catabese, Ia an. OLTP
Normalized | tables are not tables are rormralizect
NoKmattzed . (BNF)
> a ea 15 Used tO
Usage of “The clatats USed Mh | The cfata tS
a planntag, problem | perform ary t0-toyy
“Botuing and clecision-| -Fundamnent al Operations.
mn ¢ oxi: —
‘| Le provides a Te reveals a Snapshot
Task multT dimensional of present business
>) Mew of different | posus
| business tO3KSy
re we 7
| | Le Serves the Ge Serves the
Purpose |’ purpose toextact | purpose to insert,
Snformation for
LoRaly sts and-
deci sion 1a mee
Update, and delete
Information From
the cSatabase.Volume of
ata
“A targe arroone Of
Sata Ts S£Ored
“typically, WnTB, PB
the size of the
clata ts relatively
Sma ax, the
historical Hate
9s archived th
MIB and GbB-
IS Backup and
TRecove! mE
AS aresult, data,
Meeqrity 7S
unattec Fed *
Te only needs
backup —from_time
‘to time. 08 Compared
EO OLTPs
x Relat Weby Slows od pidyer 4 Fost 03 the
Sueres | ne amoud- of-dara Queries operate
tnvolved Ts large. ' on BY Of the
QSuerfes nay tace | Hata.
hours: ;
The oLare atabase whe tata Tntegrityy
Update Is note often Upcnted.| Constraint most be
2
maltntdined tran.
OLTe .cjatabase.
Tihe backup and
recover,
FS nodintat in ed
rv gorousty .
precess
“The proces sing of
LETS Comparatively.
asers
manage of by CeO,
MD>LANd Gm.
levered: s :
+9 Complex queries, Can | fase, ?o- processing.
pies take a lengenrg: because of
time Simple and
Straight formarc
Queries.
3 TePSS Of | THs catals gy THis cata 75
eNO cel by
Clerks Forex Qnd
peeneceneei 3
) -
: rel
OMY vead ane GOL Peael ane
Operattonss :
= mes | wel worive Lovite pperattonss
Y
SHeuations , a,
atty tenginy “fhe Lrser Initiates,
Update s » 5
nd >. 2 upelates >
Noeaatey atch tata UP ; i
SPerations ata | bleh are brief
ts re€resheadan | wa quick.
a MORE
" Geter wast |
“the process 1S
Nature of | Re Process 5
“EECUS ea On the
Customer, maricets
~focused on. “the 3
Audience
‘Database Resign woth @ Design “that TS
Design | -Pocus on the focused oh. ‘the
sugject- appli cations -
Prockictivity Tmproves the Enhances the.
q Sttictency, OF users prochctivity.
business AaNayses +
|e
Example QAP TS geodcfor | orp is ood -for
Onatgstng frome | Processing gn
appari
PP’ ons Precicring Customer] Customer cata
behavior, And Management, And
“eee Oder processing.e : i = tecture
2, Exploth the three -— tier Gata Ware house archite!
with neat fag rar.
Dota ware houses usually have a three - level(tier)
Ore tecture that Includes %
F Bottom, Ter (Data warehouse Server)
2) MA cele Ter (tar Server)
3, Top ter (Frene era Tools )
FA bottom Her that Consists nf the Data warehouse
Server, Which Ts almost absays an. Roems «Le mag
Sactude Several Specialtzed Gata mares and 2
Metocata repository x
A middle -HEr wonich Consists Of ON OLAP Server
or fase ALserying 39 of the cata ware house:
The oLap server TS mm plemented Using etther
wu A Retatfonal OLABGRLAPD Model, fre. , an extencad
relaticonad DBms that “PS -functfons On
a : aN Operations,
mrottaime nalonal Clata to Stanctard relational 0 (>eration,,
Y 2, A multidimensional OLAP (MoOLAP ) model, ey O
\ Particular Purpose Server “thot Arrecety YenplementS
mottaimensonat F Teformation. and Operations -
SK “AY Top-tterthat GneSas front - erd toot s-For
AMeploying reswltS ProRaed by CLAP) C4 Wet 3
ockilonal tools for data mining of the OLAP
Senerated Ciato.- aCreat | Eytemal
cata
er ae
EXL boots = a oo
Batawore hoe
Layer
eee
4
Data, L 1 ’
Marts
| oo
to Ce
wWihoti
tools, OLAP tata wising anatpis
tos tons foots
JOUR A
hre e- Ter Architecture for a Sata ware hous
Septem
Ihe. Overall Data wWorehouse Arcrttec ture 'S
Shown ‘in Figooo onan eos - pak ares
Tee sé
Mmonitorin Fa reinistrads Se *
oT SS Sere
) tiers
= Metadata. a a
—~ :
i) alae aa
Operation al. cratatase Seemol
Sources,
bree -Ter Data warehouse Architecture
| Principles of Data Warehousing.
ae
* Load performance
Data warehouses reguiire Wncrease loading of new
cota perioa?
widows }
ibe
bests WHHIN, narrow ttm e
3 Performance onthe load ProCessS Should
poocoaured in Randrects OF niiGons OF rowsand For hytesHW Load Processire
0
to toad mew? or
0
clueltine
» 4
HY Phases, Mur
7 ot be teeer
Update o 6
: ata Wea mle tater eoarenen ser WF
» ata Converse a * :
nversfon, ~CMtering) rete mating * Incleding »
and + a
i Metadata capcate
lao \
+: Batra Quality Managernent
Fact- oe smd
DASCA mn c x ie
ae < neg Se es Ae tiighe st
Luatity. “the coorermnase ENsares tocat Cons
ANObal YS : é;
3 Q Consisvency, Cea ye-feren tial Gneg thy HOP 3
Aiety” Sources and mosxsive clo tatoose ee
% Query. (Peeforeance
not be slowed
Fact- based management must
the cata ware house
bythe performance of
e Connptete
RoEms; large, Complex queries mast ©
4
In Seconds, not 4
? aye
ee
ge Size spore grooving a
size from oO fevd
ata pare FOURS
t asronishing
to pondeedS
Data WDarehou:
yoares: Today, these
Of Sgebare? anal -lerabyte- SizedTe Ts COmMmati hte
the TYP Of OLAP Servers
z Compore .
3B, oO Pp eer eng Computation. of
(Bb) Discuss about
Dato. Cubes:
(&) OLAP Servers
.
S51Ng (OLNP) refers to a Sc&
OnITne Analytical Pr ras ‘s
Ce used for cata analysts tr
E we € ee
Order to make business Aectsiong. OLAP Provicles
a: a a a
Msignes from Gatabases = /
Oo Platfor ‘or gatnin
fe om. -for 9g G ane
stem
FetHleved from muitiple Aetabase Sy
Same +%me,.
“TYPES OF OLAP Servers
The three magor types OF OLAP Servers are 0%
Follows?
* Rotap
% MoLAP
* HoLaP
A ReLatfonal oLap(RoLap )
ll __—
RelatTonal online Analytical Processing (RowAP) 7s
Primaries used -for data Stored Tha retationat 4
—o and Aimension.
fabase, Where both tre base data
es cc »ROLA
€ables are Stored a2 relational tables: ROLA
betvdeer_
Servers are Used to roege ae aie
iene
the relatfonal back-ena Server ard the © Ss
TONE-end tools,