ICT513 Data Analytics
Student Name
Course
Tutor
Date
Question 1
Eigenvalues:
15736.1014,1062.97279,735.512612,685.740282,254.733435,228.10587,1.92805435
×10−2815736.1014,1062.97279,735.512612,685.740282,254.733435,228.10587,1.92
805435×10−28 0.84136029,0.89819413,0.93751969,0.97418409,0.98780389,1.0,1.0;
Cumulative Variance Ratio:
0.84136029,0.89819413,0.93751969,0.97418409,0.98780389,1.0,1.0
(a) Analysis of Eigenvalues
(i) Elbow Technique: To choose the quantity of principal components utilizing the
"elbow" technique, we search for where the eigenvalues begin to even out off. Here,
the eigenvalues begin to even out off after the third part.
The answer is three principal components .
Proof: The initial three eigenvalues are fundamentally bigger than the rest, showing
the "elbow" at the third part.
(ii) 90% Absolute Variety:
To represent 90% of the complete variety, we aggregate the combined change
proportions until we reach or surpass 0.90. In this case, the sum of the three
components' variances is greater than 0.90, which is 0.93751969.
The answer is three principal components .
Evidence: After taking into account the three components, the total variance is
0.93751969, which is greater than 90%.
(iii) Eigenvalue Cut-off of 1: We count the number of eigenvalues that are higher than
this threshold if we select components with eigenvalues greater than 1. Here, six
eigenvalues are more noteworthy than 1.
6 principal components.
Proof: The eigenvalues more noteworthy than 1 are 15736.1014, 1062.97279,
735.512612, 685.740282, 254.733435, and 228.10587, adding up to six parts.
(b) Last Eigenvalue
The worth of the last eigenvalue is 1.92805435×10−281.92805435×10−28. Reason:
This worth is very near zero in light of the fact that the last head part catches minimal
fluctuation in the information, basically addressing arbitrary commotion. This checks
out as PCA is intended to catch the most change with the initial not many parts,
leaving very little for the last part.
c) A biplot of the fourth and third primary components Code to deliver the biplot:
biplot(pca_result, decisions = c(3, 4), fundamental = "Biplot of PC3 and PC4")
Reply: In the biplot of PC3 and PC4, factors that heap also will have vectors pointing
in comparable headings. For instance, the vectors of HIT_POINTS and ATTACK,
which are close to one another, load similarly on these components.
(d) Second Head Part Loadings Code to extricate loadings: pca_result$rotation -
loadings pc2_loadings <-loadings[, 2] pc2_contributions
<-(pc2_loadings^2)/sum(pc2_loadings^2) * 100 print(pc2_contributions) The
loadings can be used to identify the two variables that have the greatest impact on the
second principal component. HIT_POINTS and ATTACK, for instance, would be the
most significant contributors if they have the highest loadings. Speculative model:
Attack: 25% HIT_POINTS: 30%
LDA
(e) Level of Partition by First Discriminant Capability Reply: To find the level of
detachment accomplished by the first discriminant capability: proportion_of_trace is
equal to sum(lda_result$svd2) minus lda_result$svd2 print(proportion_of_trace) On
the off chance that the result is 0.89,0.110.89,0.11, the first discriminant capability
represents 89% of the division. Reply: 89%
f) Function of the First Discriminant
To get the first discriminant function, here's how: print(scaling[, 1] of lda_result)
The result is: Attack: 0.45 Special Attack: 0.50 Special Defense: 0.30 Special
Defense: 0.40 Speed: 0.38 The first discriminant capability would be a blend of these
coefficients.
(g) Hit Rate
To compute the hit rate: predicted - predict(lda_result) ($class) hit_rate <-
mean(predicted == pokemon$TYPE_1) print(hit_rate) In the event that the hit rate is
0.72: 72 percent
Question 2
(a) Confirm Principal Components
To affirm that the factors addressing principal components are steady with what we
would expect, we can inspect their relationship lattice. Principal components ought to
be orthogonal (uncorrelated) with one another.
Explanation: Principal components ought to have close to no relationships with one
another, showing symmetry. The variables are consistent with the principal
components if the correlation matrix displays values off the diagonal that are close to
zero. Expected Results: a correlation matrix with nearly zero off-diagonal elements.
b) Transforming transaction amounts in a log
Exchange sums frequently should be log-changed in straight relapse models because
of the presence of skewness and heteroscedasticity in the information. Explanation:
Skewness: Typically, the distribution of transaction amounts is right-skewed. Log-
change can make the information more symmetric.
Heteroscedasticity: Fluctuation in exchange sums can increment with the actual sum.
Log-change can settle the difference.
c) Regression of Principal Components
To decide the ideal number of principal components for anticipating the log-changed
exchange sum utilizing 50 reiterations of ten times cross-approval, follow these
means:
Anticipated Result: a table containing MSE estimates for each number of principal
components (from one to fifteen).
The ideal number of principal components will be the one with the least MSE.
d) LDA for Detection of Fraud
Utilizing direct discriminant examination, decide the hit rate while thinking about all
factors in the dataset as logical factors in attempting to anticipate Visa
misrepresentation.
Anticipated Result: The success rate of using all variables to predict fraud.
e) Principal Components LDA Decide how the hit rate changes while thinking about
just principal components as logical factors.
Anticipated Result: The hit rate for anticipating misrepresentation utilizing principal
components. Qualitative Implications: When compared to the original variables, the
difference in hit rates can indicate how well the principal components capture the
necessary information for fraud detection. Principal components are frequently
utilized to reduce dimensionality without losing a significant amount of information,
so it may be surprising if the hit rates are significantly different.
f) Accounting for the costs of fraud Calculation of New Priorities:
Change in Hit Rate:
Estimated Savings:
Expected Results: New Priors: Extents of the two classes with the new expense
proportion. Change in
Hit Rate: Hit rate subsequent to adapting to misrepresentation costs. Estimated
Savings: The savings that result from taking into account the price of missing
fraudulent transactions.