RUNNING HEAD: MACHINE LEARNING PRINCIPLES
1
Machine Learning Principles and Linear Regression Modeling
Kyle Phillips
Author’s Note
I completed my shadowing/summer experience with Old Dominion University’s Remote
Experience for Young Engineers and Scientists (REYES) program with Dr. Raul Briceno. Dr.
Briceno was a nuclear physicist professor at ODU before recently transitioning to UC Berkeley
in the past year. I would like to express my gratitude towards Dr. Briceno and all the amazing
faculty and lecturers who volunteered their time to the REYES program and made the sessions
so informative and enjoyable. Additionally, I would like to thank Mrs. Graves, my friends, and
my family for their support and encouragement throughout the entire process of Senior Project
and all it entailed. Finally, if you would like to contact me about my project or experiences,
please do not hesitate to do so at the following email address: [email protected].
Introduction
Machine learning is a field that is rapidly gaining traction and influence in all types of
industries and applications. A survey was administered in 2020 that found 2 out of 3 businesses
were currently using machine learning in some capacity and 97% of businesses planned to
implement it within the next year 2. Additionally, the market value of machine learning as a
service is projected to grow by almost 40% a year until the year 2030, demonstrating the massive
growth still awaiting the field 4. Machine learning is currently being utilized for data analysis,
novel products, and pattern recognition, among others. For example, tech companies are utilizing
machine learning in their products such as in virtual assistants like Siri or Alexa. Additionally,
retail companies may use machine learning to recommend products or increase the effectiveness
of their advertising 3. All of these applications demonstrate the versatility and overall impact
MACHINE LEARNING PRINCIPLES 2
machine learning has on society, a facet that is only expected to grow in the future. While the
field awaits such an enormous increase in demand there is also a need for increased training and
preparation for industry professionals in machine learning to meet the growing demand. In my
personal life, I have witnessed the expansion of machine learning all around me. In almost every
home there is some “smart” technology utilizing machine learning for language recognition or
another purpose. Additionally, Internet based services have been quick to implement machine
learning, with a contemporary example being social media sites like TikTok. TikTok’s
mysterious algorithm that curates ForYou pages all around the world functions through the use
of machine learning and its data analytics. With this strong interest in computer science and
machine learning, I sought out a summer experience that would allow me to pursue these fields
as well as others.
Old Dominion University hosts an annual program called the Remote Experience for
Young Engineers and Scientists (REYES). This program offers individuals of all backgrounds to
gain increased access to STEM information and opportunities from all over the world. The
program offers a variety of virtual lectures on topics ranging from the science of ants to
cybersecurity. These lectures are both broadcasted live on the Internet where viewers are able to
join sessions and ask questions in real time and are also posted on Youtube after the fact for
interested scholars to view at their own leisure. This is extremely helpful as the program is able
to remain fluid and adaptable to complex schedules. It also allows participants the ability to
pursue the topics that they are most interested in and have the most passion for. Additionally, in
recent years the program has begun developing a mentorship program where students can apply
to small focus groups underneath a distinguished professor specializing in a specific field. For
example, this year there were mentorships on nuclear physics, computer science, and more. Each
MACHINE LEARNING PRINCIPLES 3
of these mentorship programs is unique in its own way with different requirements and
management but all of them are extremely informative and educational.
What I Knew
Ever since middle school I have been interested in computer science and the creation of
impactful, beneficial technologies and programs. This interest increased with my entrance into
the Math and Science Academy as I was able to select a variety of courses in many different
fields to pursue my interests. One such course was AP Computer Science A, which I took during
my sophomore year. I had been interested in coding for a while but at the time it was a fairly
mystical concept that I did not fully comprehend. I remember panicking and frantically texting
and contacting friends and peers on how to print out the phrase “Hello World” for the summer
assignment. However, I quickly calmed down and became very interested in the material over
the course of the school year and would often challenge myself with practice problems and
experimented with creating different programs. Computer Science A was largely focused on the
Java programming language, however, other programming languages such as Python are more
commonly used with machine learning. Thus, I knew that that was one area with which I had
room to grow and work on throughout my project 6.
Then, during my junior year I elected to enroll in AP Computer Science Principles, a
more general computer science course that focused more on the concepts and ideas behind
programming and computer science in general rather than the more narrow focus that AP CSA
established on the Java programming language and syntax. It was during this class that I was first
introduced to machine learning during an exploration activity using an image based machine
learning program that we trained and tested ourselves. Although I found the activity extremely
interesting, I found myself getting more and more curious as to how the program actually worked
MACHINE LEARNING PRINCIPLES 4
as I tried thinking about how it would be coded using either the Java from AP CSA or the block
coding from AP CSP and found myself at a loss. It was this curiosity that would lay the seed and
foundation for my senior project.
As I began to outline my project and categorize my interest, I was drawn to specific
topics of investigation in machine learning. My broad overarching questions were about the
functioning and implementation of machine learning in the real world. I wanted to know more
about how machine learning worked and how it was able to be such a powerful tool around the
world. Because there were so many applications, I knew that there must be different methods and
styles of programs for the different types of problems. It was due to this that I wished to
investigate unique types of machine learning models. Additionally, I was interested in how
computers are able to take in data as an input and be able to actually analyze and winnow out the
valuable information and patterns from the data to learn and be able to create an output. The
computer’s process was particularly interesting to me after I saw some real world applications of
machine learning that were able to process data in forms such as images and videos which was
very different from the data structures I was familiar with at the time: lists, arrays, and standard
variables. Along this line I was further curious about the methods in which computers took the
analyzed data and produced predictive models and results such as sorting images into either
being of a cat or being of a dog. In general, machine learning was a very mystifying field that
was extremely abstract in both functionality and application and I hoped to pierce through the
confusion in a sense to gain a deeper and more holistic understanding of the field and process.
After this reflection of my interests and of my knowledge of the field, I came up with several
questions to pursue during my project. What are some types of machine learning models? How
can computers process and transform data into accurate predictions? How are different types of
MACHINE LEARNING PRINCIPLES 5
data analyzed? Why is machine learning so beneficial? What differentiates machine learning
from artificial intelligence?
My Story
I first began with building my capacity and preparing myself to delve deeper into
machine learning. This meant I needed to improve my programming skills as well as do some
introductory research into the concepts of machine learning. Thus, I began working through a
series of Python tutorials on Youtube to glean the basic syntax and methods that may be used in
machine learning. Every programming language has slightly different rules and formatting styles
in order for the program to function properly. Thus, it is important to review how each language
you use is unique and what it might use that is different from other languages. For example, I had
learned Java during AP CSA but Java’s syntax and rules were quite different from Python. Java
required each statement, or command, to be ended with a semicolon while Python was more
fluid and just required the line to end. Additionally, Java required subsequent lines of code
within loops or other structure to be delineated with brackets or braces while Python just
required indentations to separate the levels of code. Thus, these small changes may seem
insignificant but if the rules of syntax are not followed then the code will not be able to compile
and the processor will populate various errors and a debugging process will need to occur. In
addition to familiarity with Python, I also wanted to gain a little more experience and clarity with
machine learning specifically. After speaking with current seniors such as Ayush Jain about their
experiences with machine learning I was introduced to a website called DeepLizard which
contained helpful introductory tutorials into the concepts and ideas of machine learning without
getting too stuck in the specifics of programming the models.
MACHINE LEARNING PRINCIPLES 6
Next, over the summer I began my experience with the REYES virtual program hosted
by Old Dominion University. I also ran into my first problem as I had been unaware that they
were testing a larger version of their mentorship programs and sadly missed the opportunity to
apply into some of the more competitive or interesting programs in computer science. Luckily,
however, I was able to join a nuclear physics mentorship group under Dr. Raul Briceno that
would feature guest lecturers from universities all around the world and cover topics such as
introductions to nuclear physics, statistical analysis, and finally machine learning in nuclear
physics. Throughout this program I was afforded the opportunity to attend sessions underneath
the REYES program as well as underneath the nuclear physics mentorship group. Both resources
were extremely beneficial and interesting as they covered a broad scope of topics ranging from
the science of ants to the field of cybersecurity. This wealth of information allowed me to select
which sessions I felt would be most beneficial to my research or just whatever seemed to be
interesting or what I may be curious about.
In the nuclear mentorship group, there would often be two paired lessons each week with
the first introducing a topic and the second showing applications or otherwise expounding upon
that knowledge. For example, one such pair was on statistical analysis in machine learning. Thus,
the first lesson introduced topics in statistics like probability tables, measures of central
tendency, and data manipulation and then the second lesson provided practice exercises and
exploration. This process was continued for several different types of lessons with all sharing a
communal connection to nuclear physics. There were lessons on statistics one week and then on
machine learning the next with both connecting back to the overarching theme of nuclear
physics. This variety of content was a significant challenge and barrier to my experience as I was
not extremely well versed in the field of nuclear physics, and it took effort for me to be able to
MACHINE LEARNING PRINCIPLES 7
fully digest their references and examples. However, even if I could not decipher their specific
jargon, I was able to glean the larger themes and concepts especially if they were already
somewhat familiar to me in some capacity.
This persistence in the pursuit of knowledge is perhaps best shown in the example of the
statistics lessons. I had taken AP Statistics during my junior year and was thus familiar with
many of the equations and concepts covered in the lesson. However, I did notice several
discrepancies between the two experiences as several variables changed symbols and other small
deviations. These small differences were interesting as I was able to witness the application of
the tools and methods I had just learned in school directly into a field as advanced as nuclear
physics. I deeply enjoyed the REYES program and the lessons on machine learning further
increased my passion for the computer science and machine learning fields. Thus, I felt driven to
continue researching the fields through an independent study over the rest of the summer.
Throughout the rest of the summer, I began exploring and researching both my
overarching and sub questions about machine learning. I first began by compiling a selection of
resources in machine learning that seemed to be reputable, informative, and beneficial. An
incredible resource that I discovered was Google CoLaboratory’s tutorial on machine learning
and regression analysis. I found this to be very informative as it not only explained the process of
creating a simple machine learning program but it also provided visualizations and opportunities
for deeper exploration along the way. I worked through this 30+ hour course through the end of
the summer and gained a much clearer understanding of how machine learning might work and
be implemented in some cases. Within the tutorial were some other helpful resources as well
including tutorials on the NumPy and Pandas data structures which are often used to process,
visualize, and analyze data in machine learning.
MACHINE LEARNING PRINCIPLES 8
Thus, I worked through these tutorials as well and would eventually incorporate
information gained into my final product. I continued my research after the tutorials and began
reading through webpages and literature connected to machine learning. Throughout this
exploration, I came across even more resources including a site called Kaggle that provided free
databases on any imaginable topic. This powerful resource contains thousands of verified
datasets including one that I examined that contained 6+ years of data on Microsoft's stock in a
CSV file. A CSV file is a ‘comma separated value’ file that is, in reality, similar to a large
spreadsheet or table. Another useful website is TensorFlow which contains many trained models
and machine learning programs of various types. One in particular that I looked at was able to
determine which national landmark was seen in any uploaded photo with a fairly high level of
accuracy 5. This independent exploration helped me to establish a firmer image and
understanding of machine learning in the real world and prepared me for the rest of my senior
project.
My Product
While I was conducting my research and analysis of machine learning I was also actively
brainstorming ways to utilize my knowledge to benefit my community. I was immediately drawn
to how I was first introduced into machine learning in my AP CSP class junior year. We had
done a simple exploration activity into machine learning and artificial intelligence, but the
activity was fairly superficial and focused mainly on the results of machine learning rather than
the processes and methods used. Thus, I started to realize that I could bring additional depth to
the machine learning content in AP CSP by creating a lesson and presenting to Ocean Lakes’ AP
CSP classes. The first step was creating a product plan and presenting it to Mrs. Graves. In this
product plan, I outlined my reasoning for presenting to AP CSP classes and pieces that I thought
MACHINE LEARNING PRINCIPLES 9
would not be interesting but also educational to present. After acquiring approval for my product
plan, I then contacted Mrs. Adriano, the AP CSP teacher, to see if she would be agreeable to
hosting my presentation at some point in December. Mrs. Adriano was very gracious and
accepting of my proposal as well and extended an invitation for me to present to all three of her
AP CSP classes in the first two weeks of December. After gaining approval and setting a
tentative date to present, it was time to begin creating the product itself.
I began weighing a variety of presentation mediums that I could use to convey the
content I wanted to deliver. Some of the options that I considered included a PowerPoint,
website, video series, and a poster. In order to decide which product would best fit my purposes I
outlined my goals and what features would make the best medium. I knew that I wanted a
product that could continue to be referenced after my presentation as well as one that would be
ideal for self exploration. Thus, I eventually decided that creating a website would be the best
step forward as students could explore it on their own or be guided through the exploration in my
presentation. This decision was further supported after I began working on my portfolio website
concurrently, meaning that I was already exercising my web design skills and gaining greater
familiarity with different features and resources. In order to best benefit from this practice I
elected to use the same web design platform of Weebly for both sites.
As I began creating the website, I started by outlining my planned pages and the content I
would place on each. This helped me to organize my line of thought and hopefully make the
website as linear and sensical as possible. I looked back at my Literature Review and the topics
that I had explored during my independent research to gain a better understanding of what I felt
comfortable explaining and would be appropriate to deliver to AP CSP classes without
overwhelming them. I also immediately knew I would want an introduction/home page, a
MACHINE LEARNING PRINCIPLES 10
resources page, a history page, and an examples page. After creating the outline, I set up the
framework of the website using a Weebly template and trimming down unnecessary features and
reducing it to its base foundations. After generating a solid starting point, I needed to integrate
the actual content and materials into the website. This required review of my Literature Review,
former resources, and overall refreshing my personal understanding of the material.
For example, as I looked back on my literature review, I realized that linear regression
was a relatively easy method of machine learning to understand and would offer helpful
connections to calculus classes that the students may already be taking or have taken as well.
Thus, I began importing and creating explanations of linear regression with charts to demonstrate
topics such as loss, derivatives, gradient, and more that would be essential to a solid
understanding of linear regression. I also wanted to be able to provide a real world example of
machine learning in order to make the material more concrete. Thankfully, I had seen examples
of linear regression and sites to access free databases during my research and therefore had the
capacity to create a basic machine learning program running off of a linear regression model.
Another critical piece of my product was an activity centered around the Turing Test,
created by British computer scientist Alan Turing. This test has an interviewer blindly
interviewing two subjects, one of which is the artificial intelligence being tested and the other is
a human being. The premise of the test is that if the artificial intelligence has reached a level
comparable to the human, the interviewer should not be able to distinguish between the two
subjects with any precision greater than that of 50/50 guesses 4. Thus, to simulate this test the
activity had the students split up into groups and roleplay as the computer, interviewer, and
human baseline. In order to create the slight differences between the computer and the human
subject, I handed out cards with randomly generated words. In each response the computer
MACHINE LEARNING PRINCIPLES 11
would have to integrate one of those random words into their answer. This would hopefully
create a little awkwardness in their responses, similar to how an interviewer would be able to
identify slight differences between an actual artificial intelligence and a human being in the real
Turing Test.
At the end of the presentation I made sure to attach all of the resources I had found
helpful on my resources page of the website so students could have access to greater specifics
and detail than I may have been able to provide on my website. Finally, after creating and
touching up the website, I practiced my presentation to my friends and family and otherwise
prepared for my upcoming presentation. I also created google forms to collect data on the
effectiveness of my product and acquire feedback on my presentation.
A significant feature of my presentation was a “case study” of machine learning and
linear regression being applied to data on Microsoft’s stock. I collected data from Kaggle, an
open source database website, that contained around 6 years worth of data on Microsoft's stock.
The data included each day’s opening and closing prices, the number of daily trades on
Microsoft, and the daily highs and lows of the stock. Thus, I had thousands of data points with
which to analyze. This is an ideal scenario for machine learning because it would be difficult for
either an individual or traditional algorithm to produce a viable analysis of so much data,
whereas machine learning may be able to construct a relatively accurate model based on the data.
First, I transitioned the Kaggle dataset into a pandas dataframe which is more usable with
machine learning. This required quite a bit of trial and error, as this was a new challenge for me
and required some research into manipulating CSV files as well as methods associated with
pandas dataframes. I was then able to find a useful tutorial for linear regression that taught me a
substantial amount of the processes required to analyze the data into a visible model. Thankfully,
MACHINE LEARNING PRINCIPLES 12
some of my prior coursework and summer experience made the statistical analysis and calculus
more palatable and I was able to successfully create loss curves of the model based on different
features and labels. Overall, creating this case study was the most challenging piece of my
product creation as it frequently challenged me to grow my programming skills and critical
thinking to address challenges, bugs, and compilation errors.
Eventually, it was time to present my presentation to the classes. The first presentation
was during 1A with the subsequent presentations occurring during 4A and 4B. Each presentation
helped me to improve and address slight errors or details that may have been misrepresented
initially. I also became more adept at explaining machine learning clearly and with greater time
control over my limited resources.
My Results
Overall, I was very pleased with my presentation and the feedback I was able to receive
from the students. I presented to three AP CSP classes and garnered more than 50 responses to
my google forms, providing a plethora of information for me to see if I had succeeded in my
goals. Over 75% of people responded in the post lesson survey that they understood linear
regression with machine learning, which I was very pleased with as it was a very novel concept
that they had likely not encountered before. Additionally, it can be quite confusing and often
requires significant concentration to fully grasp. So even though I was not able to ensure
everyone completely understood linear regression after the lesson, I was able to introduce a new
idea as well as make myself available as a resource if they were interested in pursuing the topic
further. I also included a comments space on the form, where I was able to receive constructive
feedback, as well as some compliments, on my presentation and activity. Some of the comments
entered into this section included: “Great Job!”, “The activity was fun”, and “You did great at
MACHINE LEARNING PRINCIPLES 13
conveying the message, I think you should speak louder at points of emphasis and in general.
Otherwise, I learned a few things and was glad you were able to explain it so well!”. One of the
most common themes of the helpful advice was on my public speaking and suggesting slowing
down/speaking up, especially around areas of particular interest. The public speaking aspect of
the presentation was definitely the most stressful and challenging portion, but it also allowed me
to work on these concepts and foster some personal growth as well.
Additionally, although there was definitely room for improvement, I felt much better
about my presentation and my public speaking performance in general than I thought I would. At
the beginning of my product creation, the public presentation had been my greatest concern as I
had limited public speaking experience over that long of a duration. After some practice and
extended review of my material, I was much more confident than I expected and, for the most
part, was able to clearly get my points across and answer questions that arose from the students.
However, there were a few things that I might do differently if I did the process again. I
would have liked to modify the activity to have it run a little smoother as it ended up being a
little chaotic in practice. For example, I could have allowed the students to generate their own
random lists of words and have the random questions pre-generated to make the activity more
difficult for the interviewer. Additionally, I would have paused more frequently throughout the
presentation so that students could ask questions or points of clarification as needed. This would
have been more beneficial in hindsight as I had not fully considered some students may have
been being introduced to some of the topics for the very first time and may need a little longer to
process before moving on.
Overall, the senior project and my product creation was an enlightening experience that
allowed me to explore my field of interest in greater depth while also gaining valuable skills in
MACHINE LEARNING PRINCIPLES 14
communication, professionalism, and productivity. In the future, I hope to utilize the skills and
knowledge acquired from senior projects in the pursuit of a computer science degree at college
as I have begun applying as a computer science major, in large part due to my experiences in
senior projects. Additionally, I have focused on applying to schools with exemplary machine
learning programs and pathways such as Georgia Tech and MIT as my reach schools. Overall, I
truly believe that machine learning possesses great potential and a tremendous ability to
transform almost every field over the coming years.
Appendix A
MACHINE LEARNING PRINCIPLES 15
Product: Machine Learning Website (www.machinelearninglesson.weebly.com)
Appendix B
MACHINE LEARNING PRINCIPLES 16
Google Form: Pre/Post Lesson Surveys
MACHINE LEARNING PRINCIPLES 17
Appendix C
MACHINE LEARNING PRINCIPLES 18
Survey Responses
MACHINE LEARNING PRINCIPLES 19
MACHINE LEARNING PRINCIPLES 20
MACHINE LEARNING PRINCIPLES 21
References
[1] Artificial Intelligence (AI) vs. Machine Learning. CU-CAI. (2022, March 3).
Retrieved September 21, 2022, from https://ai.engineering.columbia.edu/ai-vs-machine-
learning/#:~:text=Put%20in%20context%2C%20artificial%20intelligence,and
%20improve%20themselves%20through%20experience
[2] Brown, S. (2021, April 21). Machine Learning, Explained. MIT Sloan. Retrieved
September 21, 2022, from https://mitsloan.mit.edu/ideas-made-to-matter/machine-
learning-explained
[3] Google. (n.d.). Introduction to machine learning | google developers. Google.
Retrieved September 21, 2022, from
https://developers.google.com/machine-learning/crash-course/ml-intro
[4] Oppy, G., & Dowe, D. (2021, October 4). The Turing Test. Stanford Encyclopedia of
Philosophy. Retrieved September 21, 2022, from https://plato.stanford.edu/entries/turing-
test/
[5] Weyand T, Araujo A, Cao B, Sim J. Google Landmarks Dataset v2-A Large-Scale
Benchmark for Instance-Level Recognition and Retrieval. Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition. 2020. Available:
https://arxiv.org/abs/2004.01804
[6] What is Python? Executive Summary. Python.org. (n.d.). Retrieved September 21,
2022, from https://www.python.org/doc/essays/blurb/
[7] Xu, Y., Liu, X., Cao, X., Huang, C., Liu, E., Qian, S., Liu, X., Wu, Y., Dong, F., Qiu,
C.-W., Qiu, J., Hua, K., Su, W., Wu, J., Xu, H., Han, Y., Fu, C., Yin, Z., & Zhang, J.
(2021, October 28). Artificial Intelligence: A powerful paradigm for scientific research.
The Innovation. Retrieved September 11, 2022, from
https://www.sciencedirect.com/science/article/pii/S2666675821001041