Text Classification for AI Education
Tejal Reddy Randi Williams Cynthia Breazeal
reddyt@[Link] randiw12@[Link] cynthiab@[Link]
MIT Media Lab MIT Media Lab MIT Media Lab
Cambridge, MA, USA Cambridge, MA, USA Cambridge, MA, USA
ABSTRACT
To help middle school students explore Artificial Intelligence (AI),
we built a text classifier extension into a block-based programming
interface using Tensorflow’s K-Nearest Neighbors and Universal
Sentence Encoder libraries. After training a model, students can
incorporate it into their own creations. In this paper, we discuss how
we taught students the AI concepts behind the classifier and how
students used the text classifier to build their own projects. Lastly,
we touch on how our classifier works just as well as other text
classification platforms. This text classification tool and curriculum
is a powerful way to help students become more knowledgeable
about the ever-growing field of AI and to raise their awareness
about applications of AI within their own lives.
Figure 1: Text Classifier extension interface. Users input
KEYWORDS
training data using the interface on the right and can then
AI education, text classification, text tagging, machine learning program with their model using the blocks on the left.
ACM Reference Format:
Tejal Reddy, Randi Williams, and Cynthia Breazeal. 2021. Text Classification
for AI Education. In SIGCSE ’21: ACM Special Interest Group on Computer languages catered towards students unfamiliar with programming.
Science Education, March 2021, Toronto, ON . ACM, New York, NY, USA, However, few tools allow students to create and use their own
3 pages. [Link] natural language processing algorithms.
Our extension is most similar to the Machine Learning for Kids
1 PROBLEM AND MOTIVATION [8] text classifier. However, their model is not directly built into
In recent years, Artificial Intelligence (AI) has become increasingly a programming platform and requires students to generate their
prevalent in our lives. Because of this, it is important for individuals own API keys, which have limited free use. We provided students
of all ages to be aware of how AI impacts them. To help middle with a more streamlined platform that allows them to create models
school students learn more about AI, we created the How to Train without limitations. Furthermore we created activities, similar to
Your Robot curriculum to teach students about AI, how it’s used, and those used in other middle school AI curricula [9], to help students
ethical issues with the technology. One key topic of the curriculum understand more about AI and its impacts.
is Text Classification. To teach students this topic, we provided them
with a hands-on opportunity to experiment with text classification 2.1 Curriculum Design
and apply it to their own projects. To enable this exploration, we We wanted to ensure that students thoroughly understood all the
created our own model-making application embedded within a steps of text classification. To do this, we emphasized the concepts of
block-based programming platform. (1) word embeddings, (2) the K-Nearest Neighbors (KNN) al-
gorithm, and (3) classification bias. Students then demonstrated
2 BACKGROUND AND RELATED WORK their understanding in a (4) programming activity and their final
As AI has become more prevalent, there has a rapid increase in projects.
work geared towards teaching students about AI [11]. Many AI 1. Word Embeddings: Students were introduced to the concept
platforms such as Cognimates [2], Machine Learning for Kids [8], of how words can be numerically represented with word vectors.
Snap! [6], and AppInventor [13] use block-based programming We went through examples of creating a word vector with the
word ‘princess’ and deciding whether the numbers in its vector
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed corresponding to ‘royalty’, ‘masculinity’, ‘femininity’, and ‘age’
for profit or commercial advantage and that copies bear this notice and the full citation should be high or low.
on the first page. Copyrights for components of this work owned by others than ACM 2. KNN Algorithm: To better understand the KNN algorithm,
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a students used a visual [4] of words plotted on a 2-D graph. They
fee. Request permissions from permissions@[Link]. learned how the selection of the K parameter can impact the output
SIGCSE ’21, March 2021, Toronto, ON of the algorithm.
© 2021 Association for Computing Machinery.
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00 3. Classification Bias: To illustrate classification bias, students
[Link] used a word analogies website to plot jobs such as ‘nurse’, ‘doctor’,
SIGCSE ’21, March 2021, Toronto, ON Tejal Reddy, Randi Williams, and Cynthia Breazeal
‘scientist’, ‘dancer’, ‘teacher’, ‘actor’, and ‘artist’. From there, they
were able to observe gender biases in how word vectors represent
some of these words.
4. Scratch Activity: Students put their new knowledge into
practice by using a Text Classification model-making extension we
built into a block-based programming interface. They started with
a short tutorial that showed hem how to make a robot respond to
three commands: Drive, Dance, and Speak (Figure 1).
3 APPROACH AND UNIQUENESS
The text classifier was designed to maximize ease-of-use and under-
standing for middle school students. The three main components
of the text classifier are the (1) translator, (2) sentence encoder,
and (3) classifier. The classifier is built into a block-based program-
ming interface developed on top of the open-source Scratch Blocks
repository [7] and can be accessed at [Link]
extension-boilerplate/intent-classifier/.
1. Translator: Translates user input to the classifier into English. Figure 2: Code from Healthcare Robot final project
The language of each input is automatically detected and then
translated to English before being added into the classifier model.
We implemented this translator to ensure students anywhere in the
Learning for Kids text classifier and UClassify’s text classifier [12].
world can use this tool.
We generated two test datasets, one which contained phrases that
2. Sentence Encoder: Converts user input to the classifier into
could be classified as click-bait and not click-bait [1] and the other
a word vector using Google’s Universal Sentence Encoder (USE)
which did sentiment analysis on movie reviews [5].
[3]. The version of USE used for the text classifier is trained on a
We conducted three rounds of testing with each dataset, training
deep averaging network (DAN) encoder, and the output is always
on ten randomly selected inputs. We only used five random inputs
a 512-dimensional vector.
for each label to imitate how children used these tools. We then
3. Classifier: Creates a text classification model utilizing Tensor-
tested the effectiveness of the classifiers by testing them on four
flow’s K-Nearest Neighbor’s library [10]. The model determines the
random phrases (two of each label).
appropriate label for new input by comparing it with the phrases
On the clickbait dataset, our classifier classified the test data cor-
from the training dataset.
rectly eleven times out of the total 12 tests (91.7%) across all trials.
As users input more training data, the K parameter is dynamically
ML4Kids classified 10/12 (83.3%) correctly. UClassify classified 5/12
calculated as the square root of the number of samples inputted.
(41.7%) correctly with 5/12 (41.7%) not yielding a definite classifica-
We chose this over a fixed K to allow the classifier to adjust if a
tion. On the movie reviews dataset, our classifier classified 11/12
larger training dataset was utilized. To combat having too small of
(91.7%) correctly. ML4Kids classified 7/12 (58.3%) correctly. UClas-
a K, there should be at least five training examples for each label.
sify classified 5/12 (41.7%) correctly with 1/12 (8.3%) not yielding a
definite classification.
4 RESULTS AND CONTRIBUTION
4.1 Student understanding 5 CONCLUSIONS AND FUTURE WORK
We tested the classifier with students in the Summer of 2020 during In this work, we described the process of creating and implementing
an online class with 29 students. In their daily reflections, students a text classifier for the How to Train Your Robot curriculum. Through
demonstrated their excitement about the concept of text classifica- the use of final projects, we saw that the text classifier and related
tion and the multitude of ways it could be used. “The coolest thing activities were effective in helping students understand how it
was the text classification" (Brant, age 13). “Today I got to create worked as well as its uses. Many of the students used the classifier
my own command for my robot! That was amazing"(Cacey, age 12). in their final projects to help others, and by doing this, were able
In their final projects, students used their knowledge about text to reinforce the concepts taught in class. In the future, we hope to
classification to implement their own projects. There were a total improve this understanding by adding a KNN plot so that students
of eight final projects that used text classification. Projects tended can visualize the reasons behind their classifiers’ decisions.
to align with the theme of helping others and included a snake
identifier, a TV show suggester, a dog food detector, an addition ACKNOWLEDGMENTS
robot, a chat robot, a concussion tester, an animal classifier, and a
We would like to thank the teachers and students who participated
healthcare robot (Figure 2).
as well as Amazon Future Engineer for supporting the program.
4.2 Comparison with other text classifiers
REFERENCES
To determine the effectiveness of our text classifier, we compared [1] Abhijnan Chakraborty, Bhargavi Paranjape, Sourya Kakarla, and Niloy Ganguly.
it against two similar text classifiers for children: the Machine 2016. Stop Clickbait: Detecting and preventing clickbaits in online news media.
Text Classification for AI Education SIGCSE ’21, March 2021, Toronto, ON
In Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM [7] Lifelong Kindergarten. [n.d.]. [Link]
International Conference on. IEEE, 9–16. [8] Machine Learning for Kids. [n.d.]. [Link]
[2] Stefania Druga. 2018. Cognimates. [Link] [9] Personal Robots Group and I2 Learning. [n.d.]. [Link]
[3] Google. [n.d.]. [Link] documents/[Link]
[4] Italo José. [n.d.]. [Link] [10] TensorFlow. [n.d.]. [Link]
a4707b24bd1d classifier
[5] Kaggle. [n.d.]. [Link] [11] David Touretzky, Christina Gardner-McCune, Fred Martin, and Deborah Seehorn.
of-imdb-movie-reviews/data 2019. Envisioning AI for K-12: What should every child know about AI?. In
[6] Ken Kahn, Rani Megasari, Erna Piantari, and Enjun Junaeti. 2018. AI Programming AAAI.
by Children using Snap! Block Programming in a Developing Country. In EC-TEL [12] Uclassify. [n.d.]. [Link]
Practitioner Proceedings 2018: 13th European Conference On Technology Enhanced [13] Jessica Raquelle Van Brummelen. 2019. Tools to create and democratize conver-
Learning, Leeds, UK, September 3-6, 2018. [Link] sational artificial intelligence.