Machine learning numpy,
school of AI Kuala Lumpur
Husein Zolkepli
Bayes theorem text classification
Likelihood probability, probability Prior probability,
of vector X when class C probability of class C
going to occur
Posterior probability, probability of Marginal probability, probability of
class C going to happen when vector X, most of the case, its
vector is X unobserve
Rebranding bayes theorem
Rebranding bayes theorem
Rebranding bayes theorem
Rebranding bayes theorem
Rebranding bayes theorem
Text classification
index i like chicken meat label
1 1 1 1 0 0
2 1 1 0 1 1
Kmean
1. Initiate random centroids, or use kmeans++.
Kmean
2. Keep iterating to calculate distances between individuals and centroids, and
mean clustered individuals.
Kmean
3. To calculate ELBOW,
Iterate N K-means, every iteration, calculate sum of distances between centroids
and grouped individuals, and plot.
Principal Component Analysis
Principal Component Analysis
1. Visualization
Principal Component Analysis
1. Visualization
Height, x Weight, y Bmi, z Score, a Hair Age, c Steps, d
length, b
Principal Component Analysis
1. Visualization
Height, x Weight, y Bmi, z Score, a Hair Age, c Steps, d
length, b
It does not makes sense if you want to plot this table into a vector space, we have
7 dimensions!
Principal Component Analysis
2. Reduce noise
Let say you want to study stress level of a student, based on,
Principal Component Analysis
2. Reduce noise
Let say you want to study stress level of a student, based on,
Height, x Weight, y Bmi, z Score, a Hair Age, c Steps, d
length, b
Not all these 7 dimensions bring important information! We want to reject some
attributes.
Principal Component Analysis
2. Reduce noise
Let say you want to study stress level of a student, based on,
Height, x Weight, y Bmi, z Score, a Hair Age, c Steps, d
length, b
Not all these 7 dimensions bring important information! We want to reject some
attributes. Maybe 7 does not hurt much. What happen if you have 512 * 512 * 3
(image) dimension?! insane!
Principal Component Analysis
3. Reduce memory (computer science)
Height, x Weight, y Bmi, z Score, a Hair Age, c Steps, d
length, b
Let say a float took 1 bytes, we have 7 columns and 1 billion of rows.
7 * 1,000,000,000 * 1 = 7,000,000,000 bytes == 70 GB!
Drop a column will save us 10 GB of memory!
Principal Component Analysis
I have data points
Principal Component Analysis
I have data points
I have data points
Principal Component Analysis
I have data points
I have data points
Let say, this plane is Rn , we only visualize it on R2 , I want to visualize the data
points at axis-0, which is x-axis.
Principal Component Analysis
Principal Component Analysis
Principal Component Analysis
Principal Component Analysis
We cannot distinguish between oranges and blues!
Principal Component Analysis
We cannot distinguish between oranges and blues! How about axis-2, which is,
axis-y?
Principal Component Analysis
Principal Component Analysis
It is quite okay, just a few data points overlapped each others.
Principal Component Analysis
It is quite okay, just a few data points overlapped each others. But we don’t
overlapping right?!
Principal component analysis
Eigenvector, R1, of our
covariance matrix
Principal component analysis
Principal component analysis
Principal component analysis
Im too tired man to draw one-by-one :(
Principal component analysis
How to make sense of it?
Principal component analysis
Principal component analysis
Principal component analysis
[5, 4], [5, -4],
[4, 6] [-4, 6]
Value 1 is y axis, 0 correlation
[5, 0],
[0, 1]
Principal component analysis
[1., 0.], lambda = 5
[5, 0],
[0, 1]
l, v = np.linalg.eig(np.array([[5,0],[0,1]]))
l, v
(array([5., 1.]), array([[1., 0.],
[0., 1.]]))