0% found this document useful (0 votes)
125 views21 pages

Kyle Phillips Isearch Paper

This document discusses the author's experience in the Remote Experience for Young Engineers and Scientists (REYES) program. It provides background on the author's interest in machine learning and computer science. It describes the REYES program, which provides virtual lectures and mentorship opportunities in STEM fields. It outlines the author's initial knowledge of machine learning from prior coursework and their goals for the project, which were to better understand machine learning models, data processing techniques, and real-world applications.

Uploaded by

api-650856612
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views21 pages

Kyle Phillips Isearch Paper

This document discusses the author's experience in the Remote Experience for Young Engineers and Scientists (REYES) program. It provides background on the author's interest in machine learning and computer science. It describes the REYES program, which provides virtual lectures and mentorship opportunities in STEM fields. It outlines the author's initial knowledge of machine learning from prior coursework and their goals for the project, which were to better understand machine learning models, data processing techniques, and real-world applications.

Uploaded by

api-650856612
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

RUNNING HEAD: MACHINE LEARNING PRINCIPLES

1
Machine Learning Principles and Linear Regression Modeling

Kyle Phillips

Author’s Note

I completed my shadowing/summer experience with Old Dominion University’s Remote

Experience for Young Engineers and Scientists (REYES) program with Dr. Raul Briceno. Dr.

Briceno was a nuclear physicist professor at ODU before recently transitioning to UC Berkeley

in the past year. I would like to express my gratitude towards Dr. Briceno and all the amazing

faculty and lecturers who volunteered their time to the REYES program and made the sessions

so informative and enjoyable. Additionally, I would like to thank Mrs. Graves, my friends, and

my family for their support and encouragement throughout the entire process of Senior Project

and all it entailed. Finally, if you would like to contact me about my project or experiences,

please do not hesitate to do so at the following email address: [email protected].

Introduction

Machine learning is a field that is rapidly gaining traction and influence in all types of

industries and applications. A survey was administered in 2020 that found 2 out of 3 businesses

were currently using machine learning in some capacity and 97% of businesses planned to

implement it within the next year 2. Additionally, the market value of machine learning as a

service is projected to grow by almost 40% a year until the year 2030, demonstrating the massive

growth still awaiting the field 4. Machine learning is currently being utilized for data analysis,

novel products, and pattern recognition, among others. For example, tech companies are utilizing

machine learning in their products such as in virtual assistants like Siri or Alexa. Additionally,

retail companies may use machine learning to recommend products or increase the effectiveness

of their advertising 3. All of these applications demonstrate the versatility and overall impact
MACHINE LEARNING PRINCIPLES 2

machine learning has on society, a facet that is only expected to grow in the future. While the

field awaits such an enormous increase in demand there is also a need for increased training and

preparation for industry professionals in machine learning to meet the growing demand. In my

personal life, I have witnessed the expansion of machine learning all around me. In almost every

home there is some “smart” technology utilizing machine learning for language recognition or

another purpose. Additionally, Internet based services have been quick to implement machine

learning, with a contemporary example being social media sites like TikTok. TikTok’s

mysterious algorithm that curates ForYou pages all around the world functions through the use

of machine learning and its data analytics. With this strong interest in computer science and

machine learning, I sought out a summer experience that would allow me to pursue these fields

as well as others.

Old Dominion University hosts an annual program called the Remote Experience for

Young Engineers and Scientists (REYES). This program offers individuals of all backgrounds to

gain increased access to STEM information and opportunities from all over the world. The

program offers a variety of virtual lectures on topics ranging from the science of ants to

cybersecurity. These lectures are both broadcasted live on the Internet where viewers are able to

join sessions and ask questions in real time and are also posted on Youtube after the fact for

interested scholars to view at their own leisure. This is extremely helpful as the program is able

to remain fluid and adaptable to complex schedules. It also allows participants the ability to

pursue the topics that they are most interested in and have the most passion for. Additionally, in

recent years the program has begun developing a mentorship program where students can apply

to small focus groups underneath a distinguished professor specializing in a specific field. For

example, this year there were mentorships on nuclear physics, computer science, and more. Each
MACHINE LEARNING PRINCIPLES 3

of these mentorship programs is unique in its own way with different requirements and

management but all of them are extremely informative and educational.

What I Knew

Ever since middle school I have been interested in computer science and the creation of

impactful, beneficial technologies and programs. This interest increased with my entrance into

the Math and Science Academy as I was able to select a variety of courses in many different

fields to pursue my interests. One such course was AP Computer Science A, which I took during

my sophomore year. I had been interested in coding for a while but at the time it was a fairly

mystical concept that I did not fully comprehend. I remember panicking and frantically texting

and contacting friends and peers on how to print out the phrase “Hello World” for the summer

assignment. However, I quickly calmed down and became very interested in the material over

the course of the school year and would often challenge myself with practice problems and

experimented with creating different programs. Computer Science A was largely focused on the

Java programming language, however, other programming languages such as Python are more

commonly used with machine learning. Thus, I knew that that was one area with which I had

room to grow and work on throughout my project 6.

Then, during my junior year I elected to enroll in AP Computer Science Principles, a

more general computer science course that focused more on the concepts and ideas behind

programming and computer science in general rather than the more narrow focus that AP CSA

established on the Java programming language and syntax. It was during this class that I was first

introduced to machine learning during an exploration activity using an image based machine

learning program that we trained and tested ourselves. Although I found the activity extremely

interesting, I found myself getting more and more curious as to how the program actually worked
MACHINE LEARNING PRINCIPLES 4

as I tried thinking about how it would be coded using either the Java from AP CSA or the block

coding from AP CSP and found myself at a loss. It was this curiosity that would lay the seed and

foundation for my senior project.

As I began to outline my project and categorize my interest, I was drawn to specific

topics of investigation in machine learning. My broad overarching questions were about the

functioning and implementation of machine learning in the real world. I wanted to know more

about how machine learning worked and how it was able to be such a powerful tool around the

world. Because there were so many applications, I knew that there must be different methods and

styles of programs for the different types of problems. It was due to this that I wished to

investigate unique types of machine learning models. Additionally, I was interested in how

computers are able to take in data as an input and be able to actually analyze and winnow out the

valuable information and patterns from the data to learn and be able to create an output. The

computer’s process was particularly interesting to me after I saw some real world applications of

machine learning that were able to process data in forms such as images and videos which was

very different from the data structures I was familiar with at the time: lists, arrays, and standard

variables. Along this line I was further curious about the methods in which computers took the

analyzed data and produced predictive models and results such as sorting images into either

being of a cat or being of a dog. In general, machine learning was a very mystifying field that

was extremely abstract in both functionality and application and I hoped to pierce through the

confusion in a sense to gain a deeper and more holistic understanding of the field and process.

After this reflection of my interests and of my knowledge of the field, I came up with several

questions to pursue during my project. What are some types of machine learning models? How

can computers process and transform data into accurate predictions? How are different types of
MACHINE LEARNING PRINCIPLES 5

data analyzed? Why is machine learning so beneficial? What differentiates machine learning

from artificial intelligence?

My Story

I first began with building my capacity and preparing myself to delve deeper into

machine learning. This meant I needed to improve my programming skills as well as do some

introductory research into the concepts of machine learning. Thus, I began working through a

series of Python tutorials on Youtube to glean the basic syntax and methods that may be used in

machine learning. Every programming language has slightly different rules and formatting styles

in order for the program to function properly. Thus, it is important to review how each language

you use is unique and what it might use that is different from other languages. For example, I had

learned Java during AP CSA but Java’s syntax and rules were quite different from Python. Java

required each statement, or command, to be ended with a semicolon while Python was more

fluid and just required the line to end. Additionally, Java required subsequent lines of code

within loops or other structure to be delineated with brackets or braces while Python just

required indentations to separate the levels of code. Thus, these small changes may seem

insignificant but if the rules of syntax are not followed then the code will not be able to compile

and the processor will populate various errors and a debugging process will need to occur. In

addition to familiarity with Python, I also wanted to gain a little more experience and clarity with

machine learning specifically. After speaking with current seniors such as Ayush Jain about their

experiences with machine learning I was introduced to a website called DeepLizard which

contained helpful introductory tutorials into the concepts and ideas of machine learning without

getting too stuck in the specifics of programming the models.


MACHINE LEARNING PRINCIPLES 6

Next, over the summer I began my experience with the REYES virtual program hosted

by Old Dominion University. I also ran into my first problem as I had been unaware that they

were testing a larger version of their mentorship programs and sadly missed the opportunity to

apply into some of the more competitive or interesting programs in computer science. Luckily,

however, I was able to join a nuclear physics mentorship group under Dr. Raul Briceno that

would feature guest lecturers from universities all around the world and cover topics such as

introductions to nuclear physics, statistical analysis, and finally machine learning in nuclear

physics. Throughout this program I was afforded the opportunity to attend sessions underneath

the REYES program as well as underneath the nuclear physics mentorship group. Both resources

were extremely beneficial and interesting as they covered a broad scope of topics ranging from

the science of ants to the field of cybersecurity. This wealth of information allowed me to select

which sessions I felt would be most beneficial to my research or just whatever seemed to be

interesting or what I may be curious about.

In the nuclear mentorship group, there would often be two paired lessons each week with

the first introducing a topic and the second showing applications or otherwise expounding upon

that knowledge. For example, one such pair was on statistical analysis in machine learning. Thus,

the first lesson introduced topics in statistics like probability tables, measures of central

tendency, and data manipulation and then the second lesson provided practice exercises and

exploration. This process was continued for several different types of lessons with all sharing a

communal connection to nuclear physics. There were lessons on statistics one week and then on

machine learning the next with both connecting back to the overarching theme of nuclear

physics. This variety of content was a significant challenge and barrier to my experience as I was

not extremely well versed in the field of nuclear physics, and it took effort for me to be able to
MACHINE LEARNING PRINCIPLES 7

fully digest their references and examples. However, even if I could not decipher their specific

jargon, I was able to glean the larger themes and concepts especially if they were already

somewhat familiar to me in some capacity.

This persistence in the pursuit of knowledge is perhaps best shown in the example of the

statistics lessons. I had taken AP Statistics during my junior year and was thus familiar with

many of the equations and concepts covered in the lesson. However, I did notice several

discrepancies between the two experiences as several variables changed symbols and other small

deviations. These small differences were interesting as I was able to witness the application of

the tools and methods I had just learned in school directly into a field as advanced as nuclear

physics. I deeply enjoyed the REYES program and the lessons on machine learning further

increased my passion for the computer science and machine learning fields. Thus, I felt driven to

continue researching the fields through an independent study over the rest of the summer.

Throughout the rest of the summer, I began exploring and researching both my

overarching and sub questions about machine learning. I first began by compiling a selection of

resources in machine learning that seemed to be reputable, informative, and beneficial. An

incredible resource that I discovered was Google CoLaboratory’s tutorial on machine learning

and regression analysis. I found this to be very informative as it not only explained the process of

creating a simple machine learning program but it also provided visualizations and opportunities

for deeper exploration along the way. I worked through this 30+ hour course through the end of

the summer and gained a much clearer understanding of how machine learning might work and

be implemented in some cases. Within the tutorial were some other helpful resources as well

including tutorials on the NumPy and Pandas data structures which are often used to process,

visualize, and analyze data in machine learning.


MACHINE LEARNING PRINCIPLES 8

Thus, I worked through these tutorials as well and would eventually incorporate

information gained into my final product. I continued my research after the tutorials and began

reading through webpages and literature connected to machine learning. Throughout this

exploration, I came across even more resources including a site called Kaggle that provided free

databases on any imaginable topic. This powerful resource contains thousands of verified

datasets including one that I examined that contained 6+ years of data on Microsoft's stock in a

CSV file. A CSV file is a ‘comma separated value’ file that is, in reality, similar to a large

spreadsheet or table. Another useful website is TensorFlow which contains many trained models

and machine learning programs of various types. One in particular that I looked at was able to

determine which national landmark was seen in any uploaded photo with a fairly high level of

accuracy 5. This independent exploration helped me to establish a firmer image and

understanding of machine learning in the real world and prepared me for the rest of my senior

project.

My Product

While I was conducting my research and analysis of machine learning I was also actively

brainstorming ways to utilize my knowledge to benefit my community. I was immediately drawn

to how I was first introduced into machine learning in my AP CSP class junior year. We had

done a simple exploration activity into machine learning and artificial intelligence, but the

activity was fairly superficial and focused mainly on the results of machine learning rather than

the processes and methods used. Thus, I started to realize that I could bring additional depth to

the machine learning content in AP CSP by creating a lesson and presenting to Ocean Lakes’ AP

CSP classes. The first step was creating a product plan and presenting it to Mrs. Graves. In this

product plan, I outlined my reasoning for presenting to AP CSP classes and pieces that I thought
MACHINE LEARNING PRINCIPLES 9

would not be interesting but also educational to present. After acquiring approval for my product

plan, I then contacted Mrs. Adriano, the AP CSP teacher, to see if she would be agreeable to

hosting my presentation at some point in December. Mrs. Adriano was very gracious and

accepting of my proposal as well and extended an invitation for me to present to all three of her

AP CSP classes in the first two weeks of December. After gaining approval and setting a

tentative date to present, it was time to begin creating the product itself.

I began weighing a variety of presentation mediums that I could use to convey the

content I wanted to deliver. Some of the options that I considered included a PowerPoint,

website, video series, and a poster. In order to decide which product would best fit my purposes I

outlined my goals and what features would make the best medium. I knew that I wanted a

product that could continue to be referenced after my presentation as well as one that would be

ideal for self exploration. Thus, I eventually decided that creating a website would be the best

step forward as students could explore it on their own or be guided through the exploration in my

presentation. This decision was further supported after I began working on my portfolio website

concurrently, meaning that I was already exercising my web design skills and gaining greater

familiarity with different features and resources. In order to best benefit from this practice I

elected to use the same web design platform of Weebly for both sites.

As I began creating the website, I started by outlining my planned pages and the content I

would place on each. This helped me to organize my line of thought and hopefully make the

website as linear and sensical as possible. I looked back at my Literature Review and the topics

that I had explored during my independent research to gain a better understanding of what I felt

comfortable explaining and would be appropriate to deliver to AP CSP classes without

overwhelming them. I also immediately knew I would want an introduction/home page, a


MACHINE LEARNING PRINCIPLES 10

resources page, a history page, and an examples page. After creating the outline, I set up the

framework of the website using a Weebly template and trimming down unnecessary features and

reducing it to its base foundations. After generating a solid starting point, I needed to integrate

the actual content and materials into the website. This required review of my Literature Review,

former resources, and overall refreshing my personal understanding of the material.

For example, as I looked back on my literature review, I realized that linear regression

was a relatively easy method of machine learning to understand and would offer helpful

connections to calculus classes that the students may already be taking or have taken as well.

Thus, I began importing and creating explanations of linear regression with charts to demonstrate

topics such as loss, derivatives, gradient, and more that would be essential to a solid

understanding of linear regression. I also wanted to be able to provide a real world example of

machine learning in order to make the material more concrete. Thankfully, I had seen examples

of linear regression and sites to access free databases during my research and therefore had the

capacity to create a basic machine learning program running off of a linear regression model.

Another critical piece of my product was an activity centered around the Turing Test,

created by British computer scientist Alan Turing. This test has an interviewer blindly

interviewing two subjects, one of which is the artificial intelligence being tested and the other is

a human being. The premise of the test is that if the artificial intelligence has reached a level

comparable to the human, the interviewer should not be able to distinguish between the two

subjects with any precision greater than that of 50/50 guesses 4. Thus, to simulate this test the

activity had the students split up into groups and roleplay as the computer, interviewer, and

human baseline. In order to create the slight differences between the computer and the human

subject, I handed out cards with randomly generated words. In each response the computer
MACHINE LEARNING PRINCIPLES 11

would have to integrate one of those random words into their answer. This would hopefully

create a little awkwardness in their responses, similar to how an interviewer would be able to

identify slight differences between an actual artificial intelligence and a human being in the real

Turing Test.

At the end of the presentation I made sure to attach all of the resources I had found

helpful on my resources page of the website so students could have access to greater specifics

and detail than I may have been able to provide on my website. Finally, after creating and

touching up the website, I practiced my presentation to my friends and family and otherwise

prepared for my upcoming presentation. I also created google forms to collect data on the

effectiveness of my product and acquire feedback on my presentation.

A significant feature of my presentation was a “case study” of machine learning and

linear regression being applied to data on Microsoft’s stock. I collected data from Kaggle, an

open source database website, that contained around 6 years worth of data on Microsoft's stock.

The data included each day’s opening and closing prices, the number of daily trades on

Microsoft, and the daily highs and lows of the stock. Thus, I had thousands of data points with

which to analyze. This is an ideal scenario for machine learning because it would be difficult for

either an individual or traditional algorithm to produce a viable analysis of so much data,

whereas machine learning may be able to construct a relatively accurate model based on the data.

First, I transitioned the Kaggle dataset into a pandas dataframe which is more usable with

machine learning. This required quite a bit of trial and error, as this was a new challenge for me

and required some research into manipulating CSV files as well as methods associated with

pandas dataframes. I was then able to find a useful tutorial for linear regression that taught me a

substantial amount of the processes required to analyze the data into a visible model. Thankfully,
MACHINE LEARNING PRINCIPLES 12

some of my prior coursework and summer experience made the statistical analysis and calculus

more palatable and I was able to successfully create loss curves of the model based on different

features and labels. Overall, creating this case study was the most challenging piece of my

product creation as it frequently challenged me to grow my programming skills and critical

thinking to address challenges, bugs, and compilation errors.

Eventually, it was time to present my presentation to the classes. The first presentation

was during 1A with the subsequent presentations occurring during 4A and 4B. Each presentation

helped me to improve and address slight errors or details that may have been misrepresented

initially. I also became more adept at explaining machine learning clearly and with greater time

control over my limited resources.

My Results

Overall, I was very pleased with my presentation and the feedback I was able to receive

from the students. I presented to three AP CSP classes and garnered more than 50 responses to

my google forms, providing a plethora of information for me to see if I had succeeded in my

goals. Over 75% of people responded in the post lesson survey that they understood linear

regression with machine learning, which I was very pleased with as it was a very novel concept

that they had likely not encountered before. Additionally, it can be quite confusing and often

requires significant concentration to fully grasp. So even though I was not able to ensure

everyone completely understood linear regression after the lesson, I was able to introduce a new

idea as well as make myself available as a resource if they were interested in pursuing the topic

further. I also included a comments space on the form, where I was able to receive constructive

feedback, as well as some compliments, on my presentation and activity. Some of the comments

entered into this section included: “Great Job!”, “The activity was fun”, and “You did great at
MACHINE LEARNING PRINCIPLES 13

conveying the message, I think you should speak louder at points of emphasis and in general.

Otherwise, I learned a few things and was glad you were able to explain it so well!”. One of the

most common themes of the helpful advice was on my public speaking and suggesting slowing

down/speaking up, especially around areas of particular interest. The public speaking aspect of

the presentation was definitely the most stressful and challenging portion, but it also allowed me

to work on these concepts and foster some personal growth as well.

Additionally, although there was definitely room for improvement, I felt much better

about my presentation and my public speaking performance in general than I thought I would. At

the beginning of my product creation, the public presentation had been my greatest concern as I

had limited public speaking experience over that long of a duration. After some practice and

extended review of my material, I was much more confident than I expected and, for the most

part, was able to clearly get my points across and answer questions that arose from the students.

However, there were a few things that I might do differently if I did the process again. I

would have liked to modify the activity to have it run a little smoother as it ended up being a

little chaotic in practice. For example, I could have allowed the students to generate their own

random lists of words and have the random questions pre-generated to make the activity more

difficult for the interviewer. Additionally, I would have paused more frequently throughout the

presentation so that students could ask questions or points of clarification as needed. This would

have been more beneficial in hindsight as I had not fully considered some students may have

been being introduced to some of the topics for the very first time and may need a little longer to

process before moving on.

Overall, the senior project and my product creation was an enlightening experience that

allowed me to explore my field of interest in greater depth while also gaining valuable skills in
MACHINE LEARNING PRINCIPLES 14

communication, professionalism, and productivity. In the future, I hope to utilize the skills and

knowledge acquired from senior projects in the pursuit of a computer science degree at college

as I have begun applying as a computer science major, in large part due to my experiences in

senior projects. Additionally, I have focused on applying to schools with exemplary machine

learning programs and pathways such as Georgia Tech and MIT as my reach schools. Overall, I

truly believe that machine learning possesses great potential and a tremendous ability to

transform almost every field over the coming years.

Appendix A
MACHINE LEARNING PRINCIPLES 15

Product: Machine Learning Website (www.machinelearninglesson.weebly.com)

Appendix B
MACHINE LEARNING PRINCIPLES 16

Google Form: Pre/Post Lesson Surveys


MACHINE LEARNING PRINCIPLES 17

Appendix C
MACHINE LEARNING PRINCIPLES 18

Survey Responses
MACHINE LEARNING PRINCIPLES 19
MACHINE LEARNING PRINCIPLES 20
MACHINE LEARNING PRINCIPLES 21

References

[1] Artificial Intelligence (AI) vs. Machine Learning. CU-CAI. (2022, March 3).
Retrieved September 21, 2022, from https://ai.engineering.columbia.edu/ai-vs-machine-
learning/#:~:text=Put%20in%20context%2C%20artificial%20intelligence,and
%20improve%20themselves%20through%20experience

[2] Brown, S. (2021, April 21). Machine Learning, Explained. MIT Sloan. Retrieved
September 21, 2022, from https://mitsloan.mit.edu/ideas-made-to-matter/machine-
learning-explained

[3] Google. (n.d.). Introduction to machine learning | google developers. Google.


Retrieved September 21, 2022, from
https://developers.google.com/machine-learning/crash-course/ml-intro

[4] Oppy, G., & Dowe, D. (2021, October 4). The Turing Test. Stanford Encyclopedia of
Philosophy. Retrieved September 21, 2022, from https://plato.stanford.edu/entries/turing-
test/

[5] Weyand T, Araujo A, Cao B, Sim J. Google Landmarks Dataset v2-A Large-Scale
Benchmark for Instance-Level Recognition and Retrieval. Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition. 2020. Available:
https://arxiv.org/abs/2004.01804

[6] What is Python? Executive Summary. Python.org. (n.d.). Retrieved September 21,
2022, from https://www.python.org/doc/essays/blurb/

[7] Xu, Y., Liu, X., Cao, X., Huang, C., Liu, E., Qian, S., Liu, X., Wu, Y., Dong, F., Qiu,
C.-W., Qiu, J., Hua, K., Su, W., Wu, J., Xu, H., Han, Y., Fu, C., Yin, Z., & Zhang, J.
(2021, October 28). Artificial Intelligence: A powerful paradigm for scientific research.
The Innovation. Retrieved September 11, 2022, from
https://www.sciencedirect.com/science/article/pii/S2666675821001041

You might also like