0% found this document useful (0 votes)
21 views4 pages

Practical Manual - Machine Learning Application

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views4 pages

Practical Manual - Machine Learning Application

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

PREDICTING PULSAR STARS USING MACHINE LEARNING

Introduction
In the vast expanse of the universe, pulsars, highly magnetised and rapidly rotating neutron stars,
have captivated the curiosity of astronomers and astrophysicists alike. Identifying these celestial
objects from the vast amount of astronomical data is a challenging task that can benefit from the
power of machine learning.
This experiment aims to provide undergraduate students with an exciting opportunity to explore
the fields of astrophysics and machine learning by predicting the presence of pulsar stars using a
classification model. By utilising a popular machine learning algorithm called Random Forest,
participants will delve into the realms of data analysis, feature engineering, and model training.

Objectives
The primary objective of this experiment is to introduce students to the application of machine
learning to the identification of pulsar stars. By the end of the experiment, participants should have
gained hands-on experience in:
1. Understanding the nature and significance of pulsar stars in astrophysics
2. Acquiring and preprocessing astronomical data related to pulsar candidates
3. Exploring and analysing data to identify relevant features
4. Implementing a Random Forest classification model to predict pulsar stars
5. Evaluating the performance of the model and interpreting the results
6. Gaining insights into the challenges and considerations when applying machine learning
to real-world astrophysical problems

Experimental Setup
1. Dataset: Participants will be provided with a carefully curated dataset (HTRU2) containing
various features extracted from radio frequency observations of potential pulsar stars.
These features include statistical measures, signal profiles, and other relevant information.
See the dataset description below for more information. (Note that the original dataset
has been modified for the purpose of this experiment.)
2. Data Preprocessing: Participants will explore the dataset, deal with missing values,
imbalanced data, and normalizing methods.
3. Feature Engineering: Students will identify and engineer relevant features that can enhance
the performance of the classification model. This process will involve extracting
meaningful information and selecting appropriate input variables.
4. Model Training: Using the Random Forest algorithm, participants will construct a
classification model to predict the presence of pulsar stars based on the selected features

1
with higher accuracies. They will learn how the different features may affect the
performance of the algorithm and reduce under fitting and overfitting.
5. Model Evaluation: The trained model will be evaluated using various performance metrics,
such as accuracy, precision, recall, F1 and F2 scores. Participants will interpret the results
and gain insights into the effectiveness of the model in distinguishing pulsar stars.

Conclusion
By participating in this experiment, students will gain valuable hands-on experience in the
interdisciplinary fields of astrophysics and machine learning. They will deepen their understanding
of pulsar stars, develop data analysis skills, and learn to apply machine learning algorithms to real-
world problems. This experiment serves as a stepping stone for students interested in exploring
the fascinating intersection of astronomy and data science.

References
Students may use the following references in order to complete the experiment.
 Python (recommended version 3.9.0): https://www.python.org/downloads/release/python-
390/
 Pandas (for data handling)
o Documentation: https://pandas.pydata.org/docs/
o Tutorials: https://www.w3schools.com/python/pandas/default.asp
 Scikit-learn (to build Random Forest Classifiers): https://scikit-learn.org/stable/
 Learn About Random Forest From Scratch (External Article):
https://www.linkedin.com/pulse/random-forest-explained-regression-classification-tasks-
sellahewa?utm_source=share&utm_medium=member_android&utm_campaign=share_vi
a

2
DATASET DESCRIPTION

******************************************************************************
# HTRU2

© Author: Rob Lyon, School of Computer Science & Jodrell Bank Centre for Astrophysics,
University of Manchester, Kilburn Building, Oxford Road, Manchester M13 9PL.
Web: http://www.scienceguyrob.com or http://www.cs.manchester.ac.uk
or alternatively http://www.jb.man.ac.uk
******************************************************************************

1. Overview

HTRU2 is a data set which describes a sample of pulsar candidates collected during the High Time
Resolution Universe Survey (South). Pulsars are a rare type of Neutron star that produce radio
emission detectable here on Earth. They are of considerable scientific interest as probes of space-
time, the inter-stellar medium, and states of matter.
As pulsars rotate, their emission beam sweeps across the sky, and when this crosses our line of
sight, produces a detectable pattern of broadband radio emission. As pulsars rotate rapidly, this
pattern repeats periodically. Thus pulsar search involves looking for periodic radio signals with
large radio telescopes. Each pulsar produces a slightly different emission pattern, which varies
slightly with each rotation. Thus a potential signal detection known as a 'candidate', is averaged
over many rotations of the pulsar, as determined by the length of an observation. In the absence of
additional info, each candidate could potentially describe a real pulsar. However, in practice almost
all detections are caused by radio frequency interference (RFI) and noise, making legitimate
signals hard to find.
Machine learning tools are now being used to automatically label pulsar candidates to facilitate
rapid analysis. Classification systems in particular are being widely adopted, which treat the
candidate data sets as binary classification problems. Here the legitimate pulsar examples are a
minority positive class, and spurious examples the majority negative class. At present multi-class
labels are unavailable, given the costs associated with data annotation. The data set shared here
contains 16,259 spurious examples caused by RFI/noise, and 1,639 real pulsar examples. These
examples have all been checked by human annotators. Each candidate is described by 8 continuous
variables. The first four are simple statistics obtained from the integrated pulse profile (folded
profile). This is an array of continuous variables that describe a longitude-resolved version of the

3
signal that has been averaged in both time and frequency. The remaining four variables are
similarly obtained from the DM-SNR curve. These are summarized below:
1. Mean of the integrated profile.
2. Standard deviation of the integrated profile.
3. Excess kurtosis of the integrated profile.
4. Skewness of the integrated profile.
5. Mean of the DM-SNR curve.
6. Standard deviation of the DM-SNR curve.
7. Excess kurtosis of the DM-SNR curve.
8. Skewness of the DM-SNR curve.

For the purpose of this experiment, the dataset is divided into two different files as
pulsar_data_train.csv and pulsar_data_test.csv. You must use the training data file to train your
Random Forest Algorithm. After training, use the test data file to evaluate the performance based
on the trained model. Candidates are stored in both files in separate rows. Each row lists the
variables first, and the class label is the final entry. The class labels used are 0 (negative, not a
pulsar star) and 1 (positive).

Please note that the data contains no positional information or other astronomical details.
It is simply feature data extracted from candidate files using the PulsarFeatureLab tool.

2. Abbreviations
DM: The dispersion measure, which is a measure of the amount of scattering that a radio signal
experiences as it travels through the interstellar medium.
SNR: The signal-to-noise ratio, which is a measure of the strength of the radio signal relative to
the noise.

****************
END OF THE EXPERIMENT

You might also like