0% found this document useful (0 votes)
111 views9 pages

Eigenvalues and PCA Analysis

The document discusses using principal component analysis and linear discriminant analysis for data analysis and fraud detection. It provides examples of analyzing eigenvalues and variance to determine the number of principal components as well as using loadings to identify variable contributions. Methods are given for computing hit rates and assessing the performance of models that use principal components for predicting transaction amounts and fraud.

Uploaded by

Professor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views9 pages

Eigenvalues and PCA Analysis

The document discusses using principal component analysis and linear discriminant analysis for data analysis and fraud detection. It provides examples of analyzing eigenvalues and variance to determine the number of principal components as well as using loadings to identify variable contributions. Methods are given for computing hit rates and assessing the performance of models that use principal components for predicting transaction amounts and fraud.

Uploaded by

Professor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

ICT513 Data Analytics

Student Name

Course

Tutor

Date
Question 1

Eigenvalues:

15736.1014,1062.97279,735.512612,685.740282,254.733435,228.10587,1.92805435

×10−2815736.1014,1062.97279,735.512612,685.740282,254.733435,228.10587,1.92

805435×10−28 0.84136029,0.89819413,0.93751969,0.97418409,0.98780389,1.0,1.0;

Cumulative Variance Ratio:

0.84136029,0.89819413,0.93751969,0.97418409,0.98780389,1.0,1.0

(a) Analysis of Eigenvalues

(i) Elbow Technique: To choose the quantity of principal components utilizing the

"elbow" technique, we search for where the eigenvalues begin to even out off. Here,

the eigenvalues begin to even out off after the third part.

The answer is three principal components .

Proof: The initial three eigenvalues are fundamentally bigger than the rest, showing

the "elbow" at the third part.

(ii) 90% Absolute Variety:

To represent 90% of the complete variety, we aggregate the combined change

proportions until we reach or surpass 0.90. In this case, the sum of the three

components' variances is greater than 0.90, which is 0.93751969.


The answer is three principal components .

Evidence: After taking into account the three components, the total variance is

0.93751969, which is greater than 90%.

(iii) Eigenvalue Cut-off of 1: We count the number of eigenvalues that are higher than

this threshold if we select components with eigenvalues greater than 1. Here, six

eigenvalues are more noteworthy than 1.

6 principal components.

Proof: The eigenvalues more noteworthy than 1 are 15736.1014, 1062.97279,

735.512612, 685.740282, 254.733435, and 228.10587, adding up to six parts.

(b) Last Eigenvalue

The worth of the last eigenvalue is 1.92805435×10−281.92805435×10−28. Reason:

This worth is very near zero in light of the fact that the last head part catches minimal

fluctuation in the information, basically addressing arbitrary commotion. This checks

out as PCA is intended to catch the most change with the initial not many parts,

leaving very little for the last part.

c) A biplot of the fourth and third primary components Code to deliver the biplot:

biplot(pca_result, decisions = c(3, 4), fundamental = "Biplot of PC3 and PC4")


Reply: In the biplot of PC3 and PC4, factors that heap also will have vectors pointing

in comparable headings. For instance, the vectors of HIT_POINTS and ATTACK,

which are close to one another, load similarly on these components.

(d) Second Head Part Loadings Code to extricate loadings: pca_result$rotation -

loadings pc2_loadings <-loadings[, 2] pc2_contributions


<-(pc2_loadings^2)/sum(pc2_loadings^2) * 100 print(pc2_contributions) The

loadings can be used to identify the two variables that have the greatest impact on the

second principal component. HIT_POINTS and ATTACK, for instance, would be the

most significant contributors if they have the highest loadings. Speculative model:

Attack: 25% HIT_POINTS: 30%

LDA

(e) Level of Partition by First Discriminant Capability Reply: To find the level of

detachment accomplished by the first discriminant capability: proportion_of_trace is

equal to sum(lda_result$svd2) minus lda_result$svd2 print(proportion_of_trace) On

the off chance that the result is 0.89,0.110.89,0.11, the first discriminant capability

represents 89% of the division. Reply: 89%

f) Function of the First Discriminant

To get the first discriminant function, here's how: print(scaling[, 1] of lda_result)

The result is: Attack: 0.45 Special Attack: 0.50 Special Defense: 0.30 Special

Defense: 0.40 Speed: 0.38 The first discriminant capability would be a blend of these

coefficients.

(g) Hit Rate

To compute the hit rate: predicted - predict(lda_result) ($class) hit_rate <-

mean(predicted == pokemon$TYPE_1) print(hit_rate) In the event that the hit rate is

0.72: 72 percent
Question 2

(a) Confirm Principal Components

To affirm that the factors addressing principal components are steady with what we

would expect, we can inspect their relationship lattice. Principal components ought to

be orthogonal (uncorrelated) with one another.

Explanation: Principal components ought to have close to no relationships with one

another, showing symmetry. The variables are consistent with the principal

components if the correlation matrix displays values off the diagonal that are close to

zero. Expected Results: a correlation matrix with nearly zero off-diagonal elements.
b) Transforming transaction amounts in a log

Exchange sums frequently should be log-changed in straight relapse models because

of the presence of skewness and heteroscedasticity in the information. Explanation:

Skewness: Typically, the distribution of transaction amounts is right-skewed. Log-

change can make the information more symmetric.

Heteroscedasticity: Fluctuation in exchange sums can increment with the actual sum.

Log-change can settle the difference.

c) Regression of Principal Components

To decide the ideal number of principal components for anticipating the log-changed

exchange sum utilizing 50 reiterations of ten times cross-approval, follow these

means:

Anticipated Result: a table containing MSE estimates for each number of principal

components (from one to fifteen).

The ideal number of principal components will be the one with the least MSE.

d) LDA for Detection of Fraud


Utilizing direct discriminant examination, decide the hit rate while thinking about all

factors in the dataset as logical factors in attempting to anticipate Visa

misrepresentation.

Anticipated Result: The success rate of using all variables to predict fraud.

e) Principal Components LDA Decide how the hit rate changes while thinking about

just principal components as logical factors.

Anticipated Result: The hit rate for anticipating misrepresentation utilizing principal

components. Qualitative Implications: When compared to the original variables, the

difference in hit rates can indicate how well the principal components capture the

necessary information for fraud detection. Principal components are frequently


utilized to reduce dimensionality without losing a significant amount of information,

so it may be surprising if the hit rates are significantly different.

f) Accounting for the costs of fraud Calculation of New Priorities:

Change in Hit Rate:

Estimated Savings:

Expected Results: New Priors: Extents of the two classes with the new expense

proportion. Change in

Hit Rate: Hit rate subsequent to adapting to misrepresentation costs. Estimated

Savings: The savings that result from taking into account the price of missing

fraudulent transactions.

You might also like