0% found this document useful (0 votes)
37 views86 pages

Data Mining

Data mining

Uploaded by

Rock Sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
37 views86 pages

Data Mining

Data mining

Uploaded by

Rock Sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 86
Data mein * @ Shorts dy whats cata Cleadiing 2 Explain. Ga cletoll- > Data Clearfing Ts “the Process Of Correcting or Yeteting thaccurate » Chmagedl, PPYOPEN formatted, Aupttcated or! Insuritcrent dato from a dataset. Even Mf results and at orithins, Qppear tO be Correct 5 they are: unre (fable vite the lata? Ts’ Traceurates There are NUMEFESS ways for clata to be lupticated or trcorrectly labeled wher. merging multiple cata sources. Steps —for Clearsing Data L Step 13 ‘Remove | duplicate or Yrretevant Obse evations, | Stepa: Fx Structural errors } Step3y. Filter Unwanted Outiters , | Step as -tHandle missing cata Step'S% validate and GA» Techfques -for Clearing. Data % Tgnore the -luptes 9 % Regt essibn. * fine x Fittthe missing value Cluse id pw liStns thy ns, diirnasiiscrsasate lel Characteristics of Data Cleaning. ———— ees * Accuracy. Soe uniformity * Coherence # Data verification, W9, . * Nolet * Clean Data BacKfinus Benefits of Data Clearsing When you have ‘Clean cata, You can, rane Set Stone Using the Iigiese — Quatity. information. and Eventuadiy. boost Product vite aloe Fotiowstng oe Some Sienportant- odivantages OF Cato Clearing Gn lata minfng, Wncluding + gr mettng # Removal of Inaccuracies when Several Gata Sources are Tnvolved: a Cllents are happier and employees are (ess Annoyed when there are fewer mistakes: x The Capactty, to map out the many functions and the planned uses of aoe alata. * mating. decisions more once ond with { x ith the eres pecctancd wit, be Ppessible with use of cata eangea Loose GR 2 2, List out the Components of Cate worehasor9 Data warehouse TS Useg 49 store Historical Cem hich helps to Make Strategie Ckcistoms -for tne ' business. Te is useg ~for antine Analy vical Prog (OLAP) Which helps to analyze the lata. The ata warehoure Contributes tO business executives In Systematically, orgonizing» acce pting > and using. ther Clata. to ‘Pmawe Stracegic decisions -, =, tgp cat data Warehouse has four main Componentss QO Central Gatabase, ETL (extract transform, toad) f Eoots, metadata » and access tools. AW OF the se Component are engineered for Speed So that you Can ach results quicktg. and analyse data onthe % Central database + A chtabase Serves oF PE: foun dation of your clata warehouse * Tractiti onally 5 relationag Catabascs | Sit: | r these have been Standard veniog. On. premise or Yn the cious: % “Data Tnte rations Data's pulled From Source “Sgtems and modified to align tne Yneformation _ fer rapid anatyti cal, Cons Unipt tort using & variety. Of cata ‘integration. apprraches Such 03 ET | L asxcel ad a ee cata TEP" Cation, bulk-Load as reat- time cata EraP*formmat ion, and data Proc essing . Buality ard enrich Mere Services. |e metaciata y metadata Ts clata about your clata- Te Specifies “the Sources Usage , values , and otner e fate Sets Th your Clata usarehouse- features of th There Ts business Mmetedata, which adds context to your clata, and techrScal metadata , which Aescribes how to acceSS Cata- nctudling where 6 T resides and how THIS Structured. or %#& Data warehouse access tools 3 ACCESS toolSallo E 6 ata users +o ‘interact wtth-the cata 1 your d Worehouge- Examples of access teats MWciude® tools, application. Cevetopment” Beery. and reporting. tools , cata Pointing. feats, and OLAP tools. 3, List out the OLA operations. OLAP, which Stands -for online Analytical Reng Operations are a Set. OF -techMgues Used Tn: lata mining, and, business: Ante itigence Fer onalyzing OcAP Operations Enable users to t barge ata sets: extract. Insights from motidimenstonal cata Sets 2 utekiy HaHa Cx een ate d Hictent( these Opcrattons helPe@ users e6 xPlore cata ~fiy iple ives gon. Insights m multiple perspectives, a ID) to ata retationships, and make 9 IMfoy, Med ects; A cecisions besed.on, Gata analysis. Dne ‘ ©xam x . on PLE OF OM oem, operation TS the “Stee” cratfon, F ee This OPeration allows users to extract Sat fl Oma MOKTdimengional Cube by selecting ‘ Sin . a 3 Aimensign | and a. Specific valite +for-tnat: 'Mension . o ; a Faretions Yo. oune “Deh Down, ti The bis FIR Down, OLAP Operation anows Users EO VieLd ™ otof Ore Cletaiifed Gata by expanding. a parefewlar Aimensfon Wa MUU Imen Sonal Cube. Forcxample, O User Can AriLe Clovon Tnto the Product dimension. € View data for Thafividual products, ora User Can expand Guarterly Sales, cata Into monthly. Sates figures . Dr up Tt isthe Opposite Operation of DrLt Down. the DW UP OLAP operation allows users to view Seta ata higher levee of aggregator bg Cottapsing 2 Spee Cfimensfon Jn. a: multidioensend Cute: |. ee a «yp slice — eattO™ Allows Use . ne athe slice OLAP ° S users Lo Extract a From. pavttdirensiional Cube by selecting. lata ~Trrom. Single iene nsio® and a Spectfie values for that a ne e Aimension: Dice Lap opevatio® AMows users CO extract mensional Cube by selecting. “The Dice oO cata froma mvttta? moiitple imensions and SpecSfte values for each Selected Cfimensen: Pivot The Pivot OLAP operation. altows Users to, Oo onal, Cube FO tate eye a the orventatfon of a moulliairnen si WReéus the clata —From.o- Atfferene perspective. Soong 1 ARMM ae he Scoping. oLAp Operation tovoly oa spectfied subs ce 23 restricting. the clatabase view t Screerfing. he Sctettod OLAP operation ‘tvolve s restiching che Set Of cata retriwed against the data Or members Of A Cimenson. - Dru wm 4 Be mee Orill throu gh, Sort Add measure , Drop are, Unton., olifference 4. Defihe lata Culse 2 te pide cletatl @ A COO Cube refers to aquitelimenstonal tata Structure. That fs, aig within “the data Cube 'S expidined by Specitig cfimensional volarer + “Data . . MO Cube Classification “THE clata Cube Can be classified nto EWO Categories s * Muiticimenstonal Chia Cube: Tt basically. helps Gn. Storing large Amounts of cata by making — Ce Of (a muLt- Aimensional array Te (ncreares 9 tes efficiency by Keeping! ar_inde x or each_ Aimensions thus, “mensional TS able to retrfeve Gata —-fost. %* Relational clatia Cubes Te basically helps Mn Stoving. barge amounts of data by maring use of relational Lables. Each Fetationag table = dispoys the Aimensfons of the data Cube- tte IS Stower Compared to a moltidimensional ata Cube. “Data Cube operations There are mainly ® Opcrations listed betow-— =~ — se Rolt- UP? operation and 999 Smilar cata attributes having. a regate Certain tine Sane ASmension. tog ether. 2 OFtU~clowon 4 This Operation I the reverse OF | the relt- Up operation. Pr alows “* +o tone } Particular Information. and ther subdivide ' further for Coarser granularity analysts: TCf ar 2 ae ae This operation Fitters the unnecess Or Porttons. * Dicing 3 srthis 5 operation cloes a rnultidimemional Cutting » that not only CUES only one dimension ' t ¢ ut also Can go to Another dimension and Cut @ Certatn range of % * pNot a Pp is operation @s very mportant from a Viewing an of View: Tt basically. transforms : the data Cube Yn fepps Of view. “Ad vantages * Mult? im ensional ao: * Anteractivity. * Speed and efficie “Disechantagss de Hm Compiexh * “Data size Ine tation ie 2% Performance iSsciey “4 oe \ i ee 5: lain In cletail- 8 5, Define Star SChemaq 2 ExP form _ of 2- 4 A Star Schema Tsthe ejernentory Form o Simension al n Modet, Ta ughich alata. ore g . an even Tato Facts ang Aimensions. A Fact ts anized a sale that Ys Counted Or measured, Such as Or log 4 cs ’ ce clato. Og'h A Aimension Sncludes .referen omere About the fact, such. ag cate, Item, or Cust im enaior table Star Schema Character?stics of Star Schema maereiee oa oe CCRC The Stor Schema 7s “ntensety. Suitable for Gata Warehouse catabase clesign because oF | the Foltousing. features % | %® Te Creates a De- normaltzed clatabase that Can: akc provide qe responses, r. ae Te pr ovides rexible eso ga Fe estan that Can. be ie Be Changed easily OF aed ty throughour the (4! and as + Clevelopment Cycles Sthe Aatabose gros 9 A Te provides & porate In estan. to hou a * end- users typi thine of and Ure the cata ce ial oe he reduices “the Complextey Of metadata -For bot clevetopers and end-users. a Palvantages | * Querg performance *® Load performance and adesinitstratton = —-~+ % Bee-_ referential Tote era Ss * Eosity understood Tisad vanta f sone qe A es ae reclunciancy,. 9 Uinited cfimension. Aepth .+ 5 dato Tnonsi-sten “4 Ri gidtty * * mR y @\ Essays l, DI FferentTate OLE and oLAP orth features: 3 * OLAP Stands for ONIine ~naly etcal proces sing: OLAP Systems have the Copability +0 analyze ARABS ; ce Information. of multiple systems at the ot Of OLAP service ° Current time. The Primary ae Moen analysts and noe data proce So ra] 9 # orp Stands -for onttne Transaction proces sng: oLrtp has the Work tO Ad min ?ster clay- to- day the moire goot a transactions In Ong organization: ofs- of oLTe ts data Processing. not Gara analy Features Of OLAP ang OLTP —_————— | # Data tyre OLTP uses real-time and transactional catafrom 2 etna Source ; While OLAP USES mistorical and ogaregored tata From moutetple sources: | %*% Database OLTP Uses a relational clatabase that Can fhandte Mouitiple Concurrent transacttons » wonile OLAP uses a data coarenouse that ConsoidaresS multiple fata Sources: % Data view OTP focuses, on Currentydata, whitie OLAP Generates and validates insights from Aata Compiled over time: ele a ESR ee ee | PYerpose OuTp Ts designed yw real-time transaction» te entry » whil | Of E asting > and proce s3°0F- online pane’ g ? oLar t= deat md ° sypRpere planniing- 4 for ra0F Such as Shopping , and order e volume For analyzing larg) rec Aect ston - reaxing!s suth as “fo and bucigeting. Users re frontt ine edt for USE a lers, OF na pank rel ctons+ data OLTP Systems are design workers IIke! Cashiers O ico service OPP"! ned for use by £35 and knowledge for! Customer Sette OLAP Systems Oe aestg Scfentists, business analys coorKerss and. OLTP Dtfference between OLF oraP@nine “orTPCOnline— Ca Eegory- naigeical processing) Trangaction. precessing) TeTs welk-Known a8 | Te Ts wells KNOL Qn online Gatabase | o8 an ontine Definition a Querg. management database rood if ging System. Sgrem. ee Consists -Of Consists Of historical aata a Source only operationag From: Vari clatabas ious 3} ; Current data, method Te makes ‘Use of _ Gy Te raakes use OF, A Standard cratabose used O Sata Waorehgise management sytem. Q | aaa - patent ' Application, TERS Subject-oriened| Lets applicart re Us@a -for Clatanfining | orfented - used for Maly ties, lectsipns busthess tks: maxing etc. Ty Clatabase, Lr an OL-AP catabese, Ia an. OLTP Normalized | tables are not tables are rormralizect NoKmattzed . (BNF) > a ea 15 Used tO Usage of “The clatats USed Mh | The cfata tS a planntag, problem | perform ary t0-toyy “Botuing and clecision-| -Fundamnent al Operations. mn ¢ oxi: — ‘| Le provides a Te reveals a Snapshot Task multT dimensional of present business >) Mew of different | posus | business tO3KSy re we 7 | | Le Serves the Ge Serves the Purpose |’ purpose toextact | purpose to insert, Snformation for LoRaly sts and- deci sion 1a mee Update, and delete Information From the cSatabase. Volume of ata “A targe arroone Of Sata Ts S£Ored “typically, WnTB, PB the size of the clata ts relatively Sma ax, the historical Hate 9s archived th MIB and GbB- IS Backup and TRecove! mE AS aresult, data, Meeqrity 7S unattec Fed * Te only needs backup —from_time ‘to time. 08 Compared EO OLTPs x Relat Weby Slows od pidyer 4 Fost 03 the Sueres | ne amoud- of-dara Queries operate tnvolved Ts large. ' on BY Of the QSuerfes nay tace | Hata. hours: ; The oLare atabase whe tata Tntegrityy Update Is note often Upcnted.| Constraint most be 2 maltntdined tran. OLTe .cjatabase. Tihe backup and recover, FS nodintat in ed rv gorousty . precess “The proces sing of LETS Comparatively. asers manage of by CeO, MD>LANd Gm. levered: s : +9 Complex queries, Can | fase, ?o- processing. pies take a lengenrg: because of time Simple and Straight formarc Queries. 3 TePSS Of | THs catals gy THis cata 75 eNO cel by Clerks Forex Qnd peenecenee i 3 ) - : rel OMY vead ane GOL Peael ane Operattonss : = mes | wel worive Lovite pperattonss Y SHeuations , a, atty tenginy “fhe Lrser Initiates, Update s » 5 nd >. 2 upelates > Noeaatey atch tata UP ; i SPerations ata | bleh are brief ts re€resheadan | wa quick. a MORE " Geter wast | “the process 1S Nature of | Re Process 5 “EECUS ea On the Customer, maricets ~focused on. “the 3 Audience ‘Database Resign woth @ Design “that TS Design | -Pocus on the focused oh. ‘the sugject- appli cations - Prockictivity Tmproves the Enhances the. q Sttictency, OF users prochctivity. business AaNayses + |e Example QAP TS geodcfor | orp is ood -for Onatgstng frome | Processing gn appari PP’ ons Precicring Customer] Customer cata behavior, And Management, And “eee Oder processing. e : i = tecture 2, Exploth the three -— tier Gata Ware house archite! with neat fag rar. Dota ware houses usually have a three - level(tier) Ore tecture that Includes % F Bottom, Ter (Data warehouse Server) 2) MA cele Ter (tar Server) 3, Top ter (Frene era Tools ) FA bottom Her that Consists nf the Data warehouse Server, Which Ts almost absays an. Roems «Le mag Sactude Several Specialtzed Gata mares and 2 Metocata repository x A middle -HEr wonich Consists Of ON OLAP Server or fase ALserying 39 of the cata ware house: The oLap server TS mm plemented Using etther wu A Retatfonal OLABGRLAPD Model, fre. , an extencad relaticonad DBms that “PS -functfons On a : aN Operations, mrottaime nalonal Clata to Stanctard relational 0 (>eration,, Y 2, A multidimensional OLAP (MoOLAP ) model, ey O \ Particular Purpose Server “thot Arrecety YenplementS mottaimensonat F Teformation. and Operations - SK “AY Top-tterthat GneSas front - erd toot s-For AMeploying reswltS ProRaed by CLAP) C4 Wet 3 ockilonal tools for data mining of the OLAP Senerated Ciato.- a Creat | Eytemal cata er ae EXL boots = a oo Batawore hoe Layer eee 4 Data, L 1 ’ Marts | oo to Ce wWihoti tools, OLAP tata wising anatpis tos tons foots JOUR A hre e- Ter Architecture for a Sata ware hous Septem Ihe. Overall Data wWorehouse Arcrttec ture 'S Shown ‘in Fig ooo onan eos - pak ares Tee sé Mmonitorin Fa reinistrads Se * oT SS Sere ) tiers = Metadata. a a —~ : i) alae aa Operation al. cratatase Seemol Sources, bree -Ter Data warehouse Architecture | Principles of Data Warehousing. ae * Load performance Data warehouses reguiire Wncrease loading of new cota perioa? widows } ibe bests WHHIN, narrow ttm e 3 Performance onthe load ProCessS Should poocoaured in Randrects OF niiGons OF rowsand For hytes HW Load Processire 0 to toad mew? or 0 clueltine » 4 HY Phases, Mur 7 ot be teeer Update o 6 : ata Wea mle tater eoarenen ser WF » ata Converse a * : nversfon, ~CMtering) rete mating * Incleding » and + a i Metadata capcate lao \ +: Batra Quality Managernent Fact- oe smd DASCA mn c x ie ae < neg Se es Ae tiighe st Luatity. “the coorermnase ENsares tocat Cons ANObal YS : é; 3 Q Consisvency, Cea ye-feren tial Gneg thy HOP 3 Aiety” Sources and mosxsive clo tatoose ee % Query. (Peeforeance not be slowed Fact- based management must the cata ware house bythe performance of e Connptete RoEms; large, Complex queries mast © 4 In Seconds, not 4 ? aye ee ge Size spore grooving a size from oO fevd ata pare FOURS t asronishing to pondeedS Data WDarehou: yoares: Today, these Of Sgebare? anal -lerabyte- Sized Te Ts COmMmati hte the TYP Of OLAP Servers z Compore . 3B, oO Pp eer eng Computation. of (Bb) Discuss about Dato. Cubes: (&) OLAP Servers . S51Ng (OLNP) refers to a Sc& OnITne Analytical Pr ras ‘s Ce used for cata analysts tr E we € ee Order to make business Aectsiong. OLAP Provicles a: a a a Msignes from Gatabases = / Oo Platfor ‘or gatnin fe om. -for 9g G ane stem FetHleved from muitiple Aetabase Sy Same +%me,. “TYPES OF OLAP Servers The three magor types OF OLAP Servers are 0% Follows? * Rotap % MoLAP * HoLaP A ReLatfonal oLap(RoLap ) ll __— RelatTonal online Analytical Processing (RowAP) 7s Primaries used -for data Stored Tha retationat 4 —o and Aimension. fabase, Where both tre base data es cc »ROLA €ables are Stored a2 relational tables: ROLA betvdeer_ Servers are Used to roege ae aie iene the relatfonal back-ena Server ard the © Ss TONE-end tools,

You might also like