1
6COM1044 Machine learning and Neural Computing University of Hertfordshire
Dr. Yi Sun
Practical Session – Principal Component Analysis, 2023-2024 (Semester B)
1. Aim of the Session
In this session, you will do a Principal Component Analysis (PCA) on a dataset,
namely wine.csv, in Python. At the end of this session, you should know
1) how to load data in a .csv file using Pandas;
2) how to produce a scatter plot in Python using matplotlib.pyplot;
3) how to normalise data using StandardScaler;
4) how to apply PCA on a dataset in Python;
5) how to save figures in Python.
2. Starting with Jupyter Notebook for Python
In this module, you will use the syntax of Python 3. There are many options for
development environments for Python. You will use the Jupyter notebook, a browser-
based graphical interface to the IPython (Interactive Python) shell. On windows,
• click the window icon in the bottom left corner.
• search Jupyter Notebook by typing Jupyter from the search box under All Programs.
• click on Jupyter Notebook.
• click New from Jupyter Notebook, then select Python 3 (ipykernel) from the
pop-up list. You should see a new page with a blank cell where you can type in
Python code.
• click File and select Rename from the pop-up list, you may give the file a name
you prefer, for example, practical1.
3. Do A Principal Component Analysis on the Wine Dataset
1) Download wine.csv from Canvas under Units → Practical Component
Analysis (PCA)→ Practical-PCA, and save it in a folder, for example,
Desktop.
2) Open the file from Excel and check how many data points there are and
how many features there are. More information on this dataset can be
viewed from the following link:
https://archive.ics.uci.edu/ml/datasets/wine
3) Download pca_wine.ipynb from Canvas under Units → Practical
Component Analysis (PCA)→ Practical-PCA, and save it in a folder,
for example, Desktop.
© 2019, 2022 University of Hertfordshire Higher Education Corporation
2
4) Upload pca_wine.ipynb by clicking the ‘Upload’ button on the Home
Page of the Jupyter Notebook. The file should be open on a new page.
5) Follow the instruction given in the notebook to finish nine tasks in this
session.
Please note that some cells in the notebook are complete, but some are not,
where you need to replace the question marks with proper Python code.
You need to click the ‘Run’ button in the Jupyter Notebook to run each
cell.
4. You may learn more from a nice post on how to use PCA, which can be found at the
following link:
https://jakevdp.github.io/PythonDataScienceHandbook/05.09-principal-component-
analysis.html
Reference:
J. VanderPlas: Python data science handbook. O'Reilly Media, Inc, Sebastopol, CA,
(2016)
© 2019, 2022 University of Hertfordshire Higher Education Corporation