0% found this document useful (0 votes)
10 views13 pages

Data Science Process

Data science combines math, programming, and subject knowledge to analyze both organized and unorganized data for insights and predictions. The data science process involves defining the problem, collecting and cleaning data, conducting exploratory analysis, modeling, evaluating, deploying, communicating results, and monitoring the model's performance. This structured approach enables businesses to make informed decisions and develop applications across various domains.

Uploaded by

ABCUSER1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views13 pages

Data Science Process

Data science combines math, programming, and subject knowledge to analyze both organized and unorganized data for insights and predictions. The data science process involves defining the problem, collecting and cleaning data, conducting exploratory analysis, modeling, evaluating, deploying, communicating results, and monitoring the model's performance. This structured approach enables businesses to make informed decisions and develop applications across various domains.

Uploaded by

ABCUSER1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

DATA SCIENCE

PROCESS
What is Data
Science?
Dat a Sci en ce i s a fi el d that combi nes math, progr ammin g, and subject
kn owl edge t o st u dy data. It wor ks wi th both organized data (l ike tabl es
and nu m ber s) an d u n organi zed data (li ke text, i mages, or vi deos). The
mai n pur pose i s t o fi n d useful informati on, make predict i ons, and create
sm ar t syst ems t h at can wor k automat ical ly.

Why is it
important?
• Bu si n esses m ake better , faster , and more i nformed deci sions.
• Appl i cat i on s: recommendati on systems, fr aud detecti on, cust omer
anal yt i cs, heal t hcare predi cti ons, etc
Data Process Overview
• De fi n e Pro ble m
• Da t a Co lle c t io n
• Da t a Cle a nin g &
Pre pa ra t io n
• E x plo ra t o ry Da t a An a ly sis
(E DA)
• M o de ling
• E v a lu a t io n
• De plo y me nt
• Co mmun ic a t io n
• M o n it o ring & M a int e na n c e
Step 1: Define the
Problem
• U nde rs ta n d w h a t qu e s tio n y o u a re try in g to
a nsw e r.
• M a ke th e pro b le m c le a r a n d s pe c ifi c .
• De c ide th e s c o pe o f th e pro je c t ( w h a t’ s
inc lu de d o r exc lu de d) .
• Ide n tify w h o w ill u se th e re su lts.
• Se t s u c c e s s c rite ria ( h o w y o u w ill me a su re if
the s o lu tio n w o rk s) .
• Exa mp le : Pre d ic t to mo rro w ’ s w e a th e r u sin g
pa st c lima te d a ta .
Step 2: Data
Collection
• So u rc e s o f da ta :
1.Da ta ba se s a n d sp re a ds h e e ts
2.On lin e so u rc e s a n d A PIs ( e . g. , w e a th e r, s o c ia l me dia ,
ma ps )
3.Se n s o rs a n d sma rt d e v ic e s (Io T)
4.Su rv e y s , ex pe rime n ts , o r ma n u a l re c o rds
5.We b d a ta ( c o lle c te d th ro u g h s c ra pin g o r do w n lo a ds )

• Ch a lle n ge s:
1.M iss in g o r in c o mple te in f o rma tio n
2.Erro rs a n d in c o n sis te n c ie s
3.Ve ry la rge da ta s e ts th a t a re h a rd to ma n a ge
4.Priv a c y a n d s e c u rity c o n c e rn s w h e n h a n dlin g s e n sitiv e
Step 3: Data Cleaning &


Preparation
Ra w da ta is us u a lly me ss y a n d n o t re a dy to u se .
Pro ble ms o fte n f o u n d:
• M iss in g v a lu e s
• D u plic a te re c o rds
• Wro n g o r in c o n sis te n t f o rma ts ( like d a te s, u n its,
o r tex t)
• O u tlie rs th a t do n ’ t fi t th e pa tte rn
• Cle a n in g ma ke s th e da ta a c c u ra te , c o n s iste n t,
a n d re lia ble .
Step 4: Exploratory Data
Analysis (EDA)
• Lo o k c lo se ly a t th e da ta to u n de rs ta n d it
be tte r.
• Fin d pa tte rn s , tre n ds, a n d u n u su a l
v a lu e s .
• U se v is u a ls like c h a rts a n d gra p h s:
1. H is to gra ms → s h o w
distribu tio n
2. Sc a tte r plo ts → sh o w
re la tio n sh ips
3. H e a tma ps → sh o w
c o rre la tio n s
Step 5: Data Analysis /
Modeling
• Use d a ta to a nswer q uestions or ma ke p red ictions.
• Ty p es of a na ly sis:
1. D escrip tive → Wha t ha pp ened ?
2. D ia g nostic → Why d id it ha p p en?
3. Pred ictive → Wha t mig ht ha p p en nex t?
4. Prescrip tive → Wha t should b e d one?
• Method s used:
1. S ta tistica l tests
2. Reg ression mod els
3. Foreca sting techniq ues
4. Group ing or clustering d a ta
Step 6:
Evaluation
In this ste p, w e te s t h o w w e ll th e mo d e l pe rf o rms a n d
w he th e r it c a n ma ke re lia ble pre d ic tio n s . Th e mo de l’ s
o utput is c o mpa re d w ith a c tu a l re s u lts, a n d diff e re n t
me a su re s a re u se d de pe n din g o n th e ty pe o f pro ble m. Th e
go a l is to ma ke s u re th e mo de l is a c c u ra te , g e n e ra lize s
w e ll to n e w da ta , a n d is su ita b le f o r re a l u se .

• Ke y p o in ts:
1.Co mpa re pre dic tio n s w ith a c tu a l re s u lts .
2.U se me a s u re s (me tric s ) ba s e d o n pro ble m ty pe .
3.Ens u re th e mo de l w o rks o n u n s e e n da ta .
4.Se le c t th e b e st-pe rf o rmin g mo de l.
Step 7:
Deployment
Onc e th e mo de l is re a dy a n d te s te d, it is p u t in to re a l
use so o th e rs c a n be n e fi t f ro m it. D e plo y me n t me a n s
ma kin g th e mo de l a c c e ss ib le th ro u g h to o ls, a pp s, o r
sy ste ms w h e re it c a n giv e pre dic tio n s o r in sigh ts in re a l
time o r o n de ma n d.

• Ke y p o in ts:
1.Inte gra te th e mo de l in to a c tu a l sy ste ms o r
a pplic a tio n s.
2.Can be u s e d th ro u gh A PIs, da sh bo a rds, o r a pps .
3.Sho u ld be sc a la b le ( h a n dle mo re da ta ) , re lia b le , a n d
se c u re .
Step 8:
Communication
Afte r bu ild in g a n d te stin g a mo d e l, th e re su lts n e e d to b e
sh are d in a w a y th a t is e a s y to u n de rs ta n d . Th is ste p is
a bo ut tu rn in g te c h n ic a l o u tpu ts in to c le a r in sig h ts th a t
pe o ple c a n u s e . Visu a ls like c h a rts , gra ph s , a n d
da sh bo a rds ma ke it e a s ie r to ex pla in fi n din gs.

• Ke y p o in ts:
1 . Pre s e n t re su lts c le a rly w ith v isu a ls a n d re po rts .
2 . Ke e p ex pla n a tio n s simple a n d e a sy to
unde rsta n d.
3 . Fo c u s o n in s ig h ts , n o t te c h n ic a l ja rgo n .
Step 9: Monitoring &
Maintenance
Eve n a fte r de plo y me n t, mo de ls n e e d to be w a tc h e d a n d
update d. O v e r time , d a ta c a n c h a n ge , a n d th e mo de l
ma y be c o me le s s a c c u ra te . Re gu la r mo n ito rin g e n su re s
the mo de l c o n tin u e s to pe rf o rm w e ll.

• Ke y p o in ts:
1 . Tra c k pe rf o rma n c e o v e r time .
2 . Re tra in mo de ls w ith n e w da ta .
3 . U pda te f e a tu re s if c o n ditio n s c h a n ge .
Conclusion
T h e Data S c ie n ce process is n o t on ly abo u t
creatin g m o d e ls , bu t abo ut f ollo win g a f u ll s e t o f
steps to tu rn d ata in to u sef u l k n owledge. It go es
f rom defi n in g th e problem to co llectin g, c le an i n g ,
ex plo ri n g , m o d el in g, ev alu atin g, deploy in g ,
com m u n ic ati n g , an d m o n itorin g.

You might also like