0% found this document useful (0 votes)
27 views11 pages

21BCG10035 CDS3005

The document discusses various aspects of data science, including data processing techniques, statistical analysis, and the importance of data cleaning and integration. It highlights the significance of using scientific methods to extract insights from structured and unstructured data, as well as the challenges posed by big data. Additionally, it covers concepts such as variance, covariance matrices, eigenvalues, and eigenvectors in relation to data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views11 pages

21BCG10035 CDS3005

The document discusses various aspects of data science, including data processing techniques, statistical analysis, and the importance of data cleaning and integration. It highlights the significance of using scientific methods to extract insights from structured and unstructured data, as well as the challenges posed by big data. Additionally, it covers concepts such as variance, covariance matrices, eigenvalues, and eigenvectors in relation to data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

NAMe KshJay kKiishna.

RABEReq. no21 BCGIDO 3S


ata Scien ce
founda4ienel
Slot Art + A2

1 Gfven data 2,3, 4,S,6,7;1,S, 3 ,61, FE


DimenSronal pattein Oil be

1) 35),4,3) ,5,6) 6,7),(7,8).


Get dota

,1
B,s)
A,3)
(S,6)
(6,)
= 7,8)

vectos ({)
the me can
ean vectos )
316474 S)/6)
((213144Ct6+7)/6.(14St

= 4 S , 5 )
AloO
AloO Suhrat mea) vecfos (4) Rom he
he
iven hcatuue Vetoss

(> - 4-S,1-5) -2.,-4)


=(3-4.5,s-s) 1S,0)
- (4 -4.5,3-s) E0s 2)
Xa- -45,6-) =C,1)
Xs -

4 = (6-4 S , 7-S) -(1-S, 2)


(7-4 S ,8 -S) - S , 3)

Nod
NO the Co vaMan Ce mCUTi WIll e

CovcianCe rmatsix = Z-u) ( 7i -u

Covaniance matsix =n +m)t m3 +P)art ame)/

O aclding the above mctsix and dividig


by 6, Oe ge
17-S 22
Co Varicance mcdsix
22 34

3.67
Covasiane mtTix 2-92
S67
367
NoO eigen value nd eign vector 0 the
Covarance matrix
M-11 O
trom het,

2-92 ) (5-67-A) -

(3-67x 3-67) =O
6-56 92 S67 +A- 13.47-0
A-8 S9A +3-09 -0
4elviog t Re Jet

A,S22
0 3

to
Valu 6 is vey smmll a tcm p e d
40,50,A, 1oill be let d
ergn vec tov oil he
A8-22.

2.92 367 S22

3.61 S-67 AJ

By Solving these Oe get

2-92. x 367x2 =8 22X!


+S67X2 =
8-22 X2
367X
t-omO
f+om and GD X 0-69X 2
the is
Arom L eigeVecto

tigen vectov 7 2-5


3-67

Priocip a (emponent 2-5S


367

Gven data A0,65, 30, SS, S4,43


o find Variane 5
Solutio : Aot 65 +30 S8 tS4t43
6
20
48 33
6

2
NoO6 (x)

(A0-A8-33) H(6S-48-33)+

(30-48:3)+ 68-48-33) 4(1-4 S-33) 143-48


33

Cg-38+&788+ 33S 9 t 94 0) + 3 49
5
20.4o

167S 12 A

3
(a) Data Science is a
hield ad combines a
Vaiety tool ,
algos ithms and
sem Vasious
domain Auch Cu
techniques
nahem ati Cl,
Stcutic CCmpute
SCienc e,
Scienc e
and do
enpertise to ectsact
,

domaio
maio
kno atedge qnd
om shuctuued and ingits
Datc unStsuctwred data
data
Scienti sts use a
Oclude machine Vaiety.
vasiety. fechn19ue
techniqul
and
leauning daja mOIng
Stautistic al galyis to
data and aalyge .and intespoet
temmunicale theis biodiqs
vigualizatio nç and reposHs. tbwugh
B i Data TeeR to
lage and Ccmplex
Ccmplex
dotaheets thed ase dtticult
too pov(eAf
pvcess LusÍng
hsaditional data pvcessi usiog
ten
tooly.Yhese datathects
cOMe tion multiple puese- d can
he shuctuwd ,Aemistutuned O UnsthuctLsed
he velume
COn mke
Can
velocityand vovety big dala
t chaleng107 to sfore, pocey ond
dhaly ze, but the
unsigt and
Can
a n be
knovoled20 hat
9ained hiam # make id
eCurce ter
a
valuakle
btuusines, TeseazcheA and
Organis otions 6hoA
Daa Siente ís a
iald to in involve
volve using
uling
Ccientific mefhods Proceifeg
SysHem to extyact àlgosithm Gnd
,

knoriledge .and insih/t rcm


ShuhLd nd unghu
Ctuued dot
dote
"Yel A
many recons hy data
isumpotant Such main recsoy Ciente
kolloo aje as

O he eeplosion data :
Jn trda aliSital
dligital
0Yld we
eneAafe vojt amount
data
hem Wide yasiety
a
.) souc
infelnet Sea yches et includiog Soess
nne jncheinj
e dota hat
completity. )
no daj
tonmplen and geenat otey
oftn
im divesseand Can be
induded
shuctumd dta as ell as
aatu unshwctuse
unshuctese
e need to mcke a t dliven deicfon
todcu a)t spe ed (omp efifi ve' wOorlel
buineys ad Ovganisatio) to Stseanlioe
Hhein pveesse, impove ethiciene. a nd
reduce Cot

e dosine to impvDVe ethiciency and


ppductivit Dota CCienee can
hop
Orasationt o sheamline Hhus poeK
and Imp yD Ve fficienc

(B) A Pn inesenti'al static, a pepulation

he entive oup idivicluol s OY

obfects Hhat we aul intesestecd .in


Shueling . AA sample 1s a Stubset She
popdotion Hhot we Collect elate iem ad

e to make umntflenteS abtut he pepukatio.


he easbm eUAe
0mples1) iokelevticl
Stoticg is Hat it is otfen mpactical ov
mpossible to Colec data bsam He
entise pepulation Tnstead We colleet
databierm asaniple qd wse Chtbats
Stutis tiCal Hechni9ue s to dian
ConcluMon about the populafion hased
e 4he sample

as Sevelal A/fucston hele data


ientic t migtf ne to uample
Yatha han he entve Pepulation

Ome esousLe 4 is oen teaibfe


to colled clata Komhe thtive populati-

elpecialy He pepulation á lage

AcCuha cy n sonme Cares, Collectiog


data htorma sample can be
rove acchafe han collectins daja
em he entive population
psiva Cy fn Some Coses,
doya biom entne collecting
wn etico
pepuulation may be
ConceM
MpacHical dug to pvay
Bia J n me Cases a
more
,
saniple may e
the
eprejentafNe 0
entse population. pepudation th)
4 pata cleaning h i y
Data ovolvey i'edentifyics
and coiechO OY Temoving esAOYS 6

in consiS fencies in he data


tor exanple asgkt savPe
Suppesee have datset } Csfomess
inkdmafion. thad inclucdej. the tustomehe
nane addre e

Dota integsatron - his involve Lombinin


data tylem multiple sowico unto a ge
data S ef Fo engmple
we ight have -cufonm daka shored
n anohes . o analy7e the dafa , we
ugt need lo
ntegeate Hhe tuo o
datnges
Dat Teduu ion: y nvel ve
Sige Acolucins te
the dataset by Sele chiog a
he data OY Subset
exo mple
byag9iafing the dak, fov
Miht redute he
a
laage olatacet Siae o
do-ta had by selectios
iseleyent t tio a only. +he
anal S specifi
D a t ranShormaion 10vo volyes
tvanshoymrlg the data into a
pmt
hat 1s move
Sutette hor
analysi'gfor
We
est ample
ight need fo conve
data hieldly t o
DY We
standasd harat
bormat
might need to on vet
Categotical elata i0to
numeical
datu

N 12o PodictNo Predicted ye


Acuuad : No 30

Actuayes OS 72

Accwsacg (TP4 TN)/TP4 TN+ FP+fN)


304 71)120 =0-7S A

Prexigion =TP/ (TP+ FP)

30/ 0+8) O19 y


Pecall TP/(TP1 FN)

30/3o410) = O-7S A

fI Scose 2 (Pre cision xPecal) /


Pxecisio) 4 Pecalu)

2 R O19 x O-1S)D-19+ O 75)

2 x (0-59 .-S4)

2x 0-384 74o.

0 7694 0 11 A

You might also like