0% found this document useful (0 votes)
24 views20 pages

Data Analytics Unit 2

The document outlines steps in regression modeling, including data collection, preprocessing, feature selection, model building, evaluation, and prediction, with applications in business, healthcare, engineering, and finance. It also discusses multivariate analysis techniques such as factor analysis, cluster analysis, and principal component analysis, emphasizing their objectives and applications across various fields. Additionally, it introduces Bayesian modeling and inference, highlighting its utility in uncertain situations and its applications in forecasting and fraud detection.

Uploaded by

anngeetu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views20 pages

Data Analytics Unit 2

The document outlines steps in regression modeling, including data collection, preprocessing, feature selection, model building, evaluation, and prediction, with applications in business, healthcare, engineering, and finance. It also discusses multivariate analysis techniques such as factor analysis, cluster analysis, and principal component analysis, emphasizing their objectives and applications across various fields. Additionally, it introduces Bayesian modeling and inference, highlighting its utility in uncertain situations and its applications in forecasting and fraud detection.

Uploaded by

anngeetu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 20

•4/ins/al lg°1 Pga!

paut uo paggq ageostp u seq luaird iatpqm


2upcitsguo :apittmx3

•samooino reJuo2awa JO tiretrn Wulla!paJd ioj angims •

-saprilquqoicl ppotu

(x).o uoipunj proudrs alp sXojcluts •


•(ON JO SaA. tj JO 0

•uo!.1020r3

aaatiAti slualqoid uotray!sselo Aseurq ioj pas() •

i:ualssal2all at -E
•s-paRa reuoseas pug `aidoadsaps p Jaci

-tutu! 'pm& 2u!spaanire uo pas-eq anuanal gaps sAtredtuoa a ?utp!paid :atcluigxa

•txuai Iona ato s!


pug ‘sitlaumijaoo ato are uq • • • 'zq 'Tq `lciaNalu!

aql DIDIjAk

uXuq Zirq
+ixLqfi soluopoq uotTenbc) atu •
•saiqe!avii luopuadapui
aicinp-nu apnpul of uo!ssaiNai seam' spuopcg •

:uo!ssal2•H Jeaun aidurnpv/::


-tior_ivDoi pug `sulooi jo Jar:lulu"! 'ams uo pasgq saD!ad asnoq Bu-
alp:1d :aidurexa

•satuanno snonunuoa 2tiliotpaid ioj arqvirns •

-painc.tisrp Alleumott are (giallo)

grenp!soi Peril pug niouq s! sonelx0A TIDDAl20C1 dItISUOMIRT alp Tem s-aumssy •
-.L.da;)104ui st d
pug 'ado's ;Hp RI tu `ans!igA 4uapuoclopur

s! `airrer.reA 4trapitadap atji sr


fi aJatim 'a + .711.1 = fi tiopenba alp Wursn

sorputin luapuadaput pun luapuadap uaamlaq dmsuopepi .reatig e saqs!iquis3 •


:uocssa.iNau .reaun

siapow uo!ssa.123-11 Jo sadA

•0u!laveui pue '!Rulioatii2ua `autorpatu


`sJituouoaa se clans smog sno!ien tr! sua!spap potuloju! 93tvw pug spuailluapt od,

.angt.tRA impuadap 13 uo soft-mien itiapuadapu! jo aauanijul


aqi puelsiapun od, ••

•ulvp p3Dprolgrq uo pasgq samooino ainlnj


oi, •
•saigetren tiaamlaq sdpisuomial Aijniumb pile xpviap! 01 •

2uliapow uolssaaNau Jo sanpaa 0


•saiqupgA itrapuadapu! jo iopvellaq am no pasug arturzun 'map

•(sa.mwaj .1:1)0as.Atio:oortipdaini(18)

-uadap awl 2uip!paid pue g2tIllapotu limptmislapun sdpq

rtquourspunj s! 2tmapow uo!ss.)12a1T

liniqut.ren luDptIDC10p11! 0.10111 .10 auo pug (atuoalno) onoveA waptiodop auo
ay auturexa pawl anblutpal
p3aps!ms
2uHapow uoIssaa2aa

sawN pairelacy-
/Steps in Regression Modeling

(a) Data Collection: Gather data, relevant to the problem, ensuring accuracy
and completeness.

(h) Data Preprocessing: Handle missing values, scale variables, and identify
outliers.
(c) Feature Selection: Identify the most significant predictors using methods
like correlation analysis or stepwise selection.
(d) Model Building: Fit the regression model using statistical software or pro-
gramming languages like Python or R..
(e) Model Evaluation: Assess the model's performance using metrics such as
R2, Mean Squared Error (MSE), or Mean Absolute Error (MAE).
(f) Prediction: Use the model to make predictions on new or unseen data.

Applications of Regression Modeling

• Business: Forecasting sales, revenue, and market trends.


• Healthcare: Predicting disease outcomes or treatment effectiveness.
• Engineering: Modeling system reliability and performance.
• Finance: Estimating stock prices or credit risks.

Example: Predicting Sales Revenue A retail company wants to predict

its monthly sales revenue based on advertising spend and the number of active
customers. Using multiple linear regression, the dependent variable is sales
revenue,
and the independent variables are advertising spend and customer count. The
fitted model could help optimize the allocation of marketing budgets for maximum
revenue.

Multivariate Analysis

Multivariate analysis is a statistical technique used to analyze data involving


multi-
ple variables simultaneously. It helps in understanding the relationships,
patterns,
and structure within datasets where more than two variables are interdependent.

Objectives of Multivariate Analysis

• To identify relationships and dependencies among multiple variables.


• To reduce the dimensionality of datasets while retaining important informa-
tion.
• To classify or group data into meaningful categories.

• To predict outcomes based on multiple predictors.

3
2
UMW

Pr"' of Multivariate
Analysis Techniques

(a) Factor Analysis:

Indata. Identifies underlying factors or


latent variables that explain the observed

A
Reduces a latsmaller groups based on their corre- .ge set of
variables into sm

lations.


Uses methods like Principal Axis Factoring (PAF) or Maximum Likelihood

Estimatimi (11 ILE).

Example: fn
psychology-, factor inialysis is used to identify latent traits like

intelligence or
personality from observed behavior.
(b) Cluster Analysis:

7Groups similar
data points into clusters based on their characteristics.
I Connnon
algoriSCAN.
y thms include
K-Illeans, Hierarchical Clustering, and DB-

• Does
not require

ysis,
pre-defined labels and is used for exploratory data, anal-

Example: Customer
segmentation in marketing to classify customers into
(c) Principal Cam groups like high-value, low-
value, or occasion buy era.
, A dimensionality reduction
technique that transforms data into a set of

al ponent Analysis (PCA):

linearly
uncorrclated components (principal components).

&tains as much
variance as possible while reducing the number of vari-

ables.

Example: Simplifying genuine data by


reducing thousands of genetic vari-

/ti Helps visualize high-


dimensional data in 2D or 3D spaces.

ables to a manageable number of


principal components.

/Applications of Multivariate And

• marketing: Customer segmentation


analysis.
, product positioning,
Finance: Risk assessment, portfolio optimization,
and fraud detection.


Healthcare: Analyzing patient data to predict disease
outcomes or treatment

respon_ses.


Psychology: Identifying personality traits or
data.
cognitive factors using survey
Environment: Studying the impact of multiple
environmental factors on

3
.II. 1 Elul1 y VI IL 1.4 LJ 1 11 1

Steps in Multivariate Analysis

(a) Define the Problem: Clearly identify the objectives and variables to be

analyzed.

(h) Collect Data: Gather accurate and relevant data for all variables.

(c) Preprocess Data: Handle missing values, standardize variables, and detect
outliers.

(d) Choose the Method: Select an appropriate multivariate technique based on


the objective.

(c) Apply the Method: Use statistical software (e.g., Python, R, SPSS) to
conduct the analysis.

(f) Interpret Results: Understand the output, identify patterns, and draw ac-
tionable insights.

Advantages of Multivariate Analysis

• Handles complex datasets with multiple interdependent variables.

• Reduces dimensionality while retaining essential information.

• Enhances predictive accuracy in machine learning models.

• Provides deeper insights for decision-making.

Limitations of Multivariate Analysis

• Requires a large sample size to achieve reliable results.

• Sensitive to mu lticollinearity among variables.

• Interpretation of results can be challenging for non-experts.

Example: Customer Segmentation in Marketing A retail co►-

pany wants to segment its customer base to improve targeted marketing campaigns.
Using cluster analysis. customer data such as age, income, purchase frequency, and
product preferences are grouped into clusters. The company identifies three main

segments:

(a) High-income, frequent buyers.

(b) Middle-income, occasional buyers.

(c) Low-income, infrequent buyers.

The insights help the company design personalized offers and allocate marketing
budgets effectively.

4
3 Bayesian Modeling, Inference, and Bayesian Net-
works

1. Bayesian Modeling
F3ayesian modeling is a statistical approach that applies Baves' theorem to
update probabilities as new evidence or information becomes available.
• It incorporates prior knowledge (prior probabilities) along with new evidence
(likelihood) to compute updated probabilities (posterior probabilities).
• Bays' theorem is expressed as: p(BIA)P(A)
P(AIB) = P(B)
where:
— P(_41,8): Posterior probability (probability of A given B).
— P(BIA): Likelihood (probability of observing B given A).
— P(A): Prior probability (initial belief about A).
— P(B): Evidence (probability of observing B).
/ Bayesian modeling is particularly useful in situations with uncertainty or in-
complete data.
Applications:
— Forecasting in finance, weather. and sports.
— Fraud detection in transactions.
— Medical diagnosis based on symptoms and test results.

2. Inference in Bayesian Modeling

inference involves the process of deducing likely outcomes based on


prior knowledge and new evidence.
• It answers questions such as:
— What is the probability of a hypothesis being true given the observed
data?
— How should we update our belief about a hypothesis when new data is
observed?
/Types of Bayesian inference:
(a) Point Estimation: Finds the single best estimate of a parameter (e.g.,
Maximum A Posteriori (MAP)).
(h) Interval Estimation: Provides a range of values (credible intervals
where a parameter likely lies.
(c) Posterior Predictive Checks: Validates models by comparing predic-
tions to observed data.
• Advantages:
Allows for dynamic updates as new data becomes available.
— Handles uncertainty effectively by integrating prior information.

You might also like