0% found this document useful (0 votes)
823 views945 pages

Introduction To Computer Science - WEB

Computer science

Uploaded by

souddirachid01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
823 views945 pages

Introduction To Computer Science - WEB

Computer science

Uploaded by

souddirachid01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 945

Introduction to Computer

Science

SENIOR CONTRIBUTING AUTHOR


DR. JEAN-CLAUDE FRANCHITTI, NYU COURANT INSTITUTE
OpenStax
Rice University
6100 Main Street MS-375
Houston, Texas 77005

To learn more about OpenStax, visit https://openstax.org.


Individual print copies and bulk orders can be purchased through our website.

©2024 Rice University. Textbook content produced by OpenStax is licensed under a Creative Commons
Attribution 4.0 International License (CC BY 4.0). Under this license, any user of this textbook or the textbook
contents herein must provide proper attribution as follows:

- If you redistribute this textbook in a digital format (including but not limited to PDF and HTML), then you
must retain on every page the following attribution:
“Access for free at openstax.org.”
- If you redistribute this textbook in a print format, then you must include on every physical page the
following attribution:
“Access for free at openstax.org.”
- If you redistribute part of this textbook, then you must retain in every digital format page view (including
but not limited to PDF and HTML) and on every physical printed page the following attribution:
“Access for free at openstax.org.”
- If you use this textbook as a bibliographic reference, please include
https://openstax.org/details/books/introduction-computer-science in your citation.

For questions regarding this licensing, please contact [email protected].

Trademarks
The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, OpenStax CNX logo,
OpenStax Tutor name, Openstax Tutor logo, Connexions name, Connexions logo, Rice University name, and
Rice University logo are not subject to the license and may not be reproduced without the prior and express
written consent of Rice University.

Kendall Hunt and the Kendall Hunt Logo are trademarks of Kendall Hunt. The Kendall Hunt mark is registered
in the United States, Canada, and the European Union. These trademarks may not be used without the prior
and express written consent of Kendall Hunt.

COLOR PAPERBACK BOOK ISBN-13 978-1-711471-83-9


B&W PAPERBACK BOOK ISBN-13 978-1-711471-82-2
DIGITAL VERSION ISBN-13 978-1-961584-58-7
ORIGINAL PUBLICATION YEAR 2024
1 2 3 4 5 6 7 8 9 10 CJP 24
OPENSTAX

OpenStax provides free, peer-reviewed, openly licensed textbooks for introductory college and Advanced
Placement® courses and low-cost, personalized courseware that helps students learn. A nonprofit ed tech
initiative based at Rice University, we’re committed to helping students access the tools they need to complete
their courses and meet their educational goals.

RICE UNIVERSITY

OpenStax is an initiative of Rice University. As a leading research university with a distinctive commitment to
undergraduate education, Rice University aspires to path-breaking research, unsurpassed teaching, and
contributions to the betterment of our world. It seeks to fulfill this mission by cultivating a diverse community
of learning and discovery that produces leaders across the spectrum of human endeavor.

PHILANTHROPIC SUPPORT

OpenStax is grateful for the generous philanthropic partners who advance our mission to improve educational

access and learning for everyone. To see the impact of our supporter community and our most updated list of

partners, please visit openstax.org/foundation.

Arnold Ventures Burt and Deedee McMurtry

Chan Zuckerberg Initiative Michelson 20MM Foundation

Chegg, Inc. National Science Foundation

Arthur and Carlyse Ciocca Charitable Foundation The Open Society Foundations

Digital Promise Jumee Yhu and David E. Park III

Ann and John Doerr Brian D. Patterson USA-International Foundation

Bill & Melinda Gates Foundation The Bill and Stephanie Sick Fund

Girard Foundation Steven L. Smith & Diana T. Go

Google Inc. Stand Together

The William and Flora Hewlett Foundation Robin and Sandy Stuart Foundation

The Hewlett-Packard Company The Stuart Family Foundation

Intel Inc. Tammy and Guillermo Treviño

Rusty and John Jaggers Valhalla Charitable Foundation

The Calvin K. Kazanjian Economics Foundation White Star Education Foundation

Charles Koch Foundation Schmidt Futures

Leon Lowenstein Foundation, Inc. William Marsh Rice University

The Maxfield Foundation


Study where you want, what
you want, when you want.
When you access your book in our web view, you can use our new online
highlighting and note-taking features to create your own study guides.

Our books are free and flexible, forever.


Get started at openstax.org/details/books/introduction-computer-science

Access. The future of education.


openstax.org
CONTENTS

Preface 1

PART 1 PROBLEM SOLVING AND ALGORITHMS


1 Introduction to Computer Science 9
Introduction 9
1.1 Computer Science 10
1.2 Computer Science across the Disciplines 20
1.3 Computer Science and the Future of Society 25
Chapter Review 33

2 Computational Thinking and Design Reusability 39


Introduction 39
2.1 Computational Thinking 40
2.2 Architecting Solutions with Adaptive Design Reuse in Mind 53
2.3 Evolving Architectures into Useable Products 73
Chapter Review 83

3 Data Structures and Algorithms 91


Introduction 91
3.1 Introduction to Data Structures and Algorithms 91
3.2 Algorithm Design and Discovery 100
3.3 Formal Properties of Algorithms 107
3.4 Algorithmic Paradigms 113
3.5 Sample Algorithms by Problem 119
3.6 Computer Science Theory 127
Chapter Review 131

PART 2 REALIZATIONS OF ALGORITHMS


Linguistic Realization of Algorithms: Low-Level Programming Languages
4
145
Introduction 145
4.1 Models of Computation 146
4.2 Building C Programs 158
4.3 Parallel Programming Models 175
4.4 Applications of Programming Models 181
Chapter Review 184

5 Hardware Realizations of Algorithms: Computer Systems Design 195


Introduction 195
5.1 Computer Systems Organization 196
5.2 Computer Levels of Abstraction 200
5.3 Machine-Level Information Representation 208
5.4 Machine-Level Program Representation 214
5.5 Memory Hierarchy 221
5.6 Processor Architectures 230
Chapter Review 234

6 Infrastructure Abstraction Layer: Operating Systems 243


Introduction 243
6.1 What Is an Operating System? 244
6.2 Fundamental OS Concepts 250
6.3 Processes and Concurrency 262
6.4 Memory Management 272
6.5 File Systems 279
6.6 Reliability and Security 284
Chapter Review 290

PART 3 DESIGNING AND DEVELOPING SOFTWARE SOLUTIONS


7 High-Level Programming Languages 303
Introduction 303
7.1 Programming Language Foundations 304
7.2 Programming Language Constructs 317
7.3 Alternative Programming Models 337
7.4 Programming Language Implementation 346
Chapter Review 353

8 Data Management 365


Introduction 365
8.1 Data Management Focus 366
8.2 Data Management Systems 372
8.3 Relational Database Management Systems 378
8.4 Nonrelational Database Management Systems 396
8.5 Data Warehousing, Data Lakes, and Business Intelligence 403
8.6 Data Management for Shallow and Deep Learning Applications 408
8.7 Informatics and Data Management 418
Chapter Review 420

9 Software Engineering 435


Introduction 435
9.1 Software Engineering Fundamentals 436
9.2 Software Engineering Process 446

Access for free at openstax.org


9.3 Special Topics 474
Chapter Review 496

10 Enterprise and Solution Architectures Management 507


Introduction 507
10.1 Patterns Management 508
10.2 Enterprise Architecture Management Frameworks 516
10.3 Solution Architecture Management 551
Chapter Review 556

PART 4 BUILDING MODERN END-TO-END SOLUTIONS TO BUSINESS AND SOCIAL PROBLEMS


11 Web Applications Development 565
Introduction 565
11.1 Modern Web Applications Architectures 566
11.2 Sample Responsive WAD with Bootstrap and Django 584
11.3 Sample Responsive WAD with Bootstrap/React and Node 610
11.4 Sample Responsive WAD with Bootstrap/React and Django 625
11.5 Sample Native WAD with React Native and Node or Django 630
11.6 Sample Ethereum Blockchain Web 2.0/Web 3.0 Application 643
Chapter Review 656

12 Cloud-Native Applications Development 665


Introduction 665
12.1 Introduction to Cloud-Native Applications 666
12.2 Cloud-Based and Cloud-Native Applications Deployment Technologies 690
12.3 Example PaaS and FaaS Deployments of Cloud-Native Applications 711
Chapter Review 750

13 Hybrid Multicloud Digital Solutions Development 761


Introduction 761
13.1 Hybrid Multicloud Solutions and Cloud Mashups 762
13.2 Big Cloud IaaS Mainstream Capabilities 766
13.3 Big Cloud PaaS Mainstream Capabilities 774
13.4 Towards Intelligent Autonomous Networked Super Systems 790
Chapter Review 805

PART 5 HUMAN-CENTERED RESPONSIBLE COMPUTING


14 Cyber Resources Qualities and Cyber Computing Governance 817
Introduction 817
14.1 Cyber Resources Management Frameworks 818
14.2 Cybersecurity Deep Dive 839
14.3 Governing the Use of Cyber Resources 899
Chapter Review 906

A Appendix A: Network Design Application of Algorithms 917

Index 923

Access for free at openstax.org


Preface 1

Preface
About OpenStax
OpenStax is part of Rice University, which is a 501(c)(3) nonprofit charitable corporation. As an educational
initiative, it's our mission to improve educational access and learning for everyone. Through our partnerships
with philanthropic organizations and our alliance with other educational resource companies, we're breaking
down the most common barriers to learning. Because we believe that everyone should and can have access to
knowledge.

About OpenStax Resources


Customization
Introduction to Computer Science is licensed under a Creative Commons Attribution 4.0 International (CC BY)
license, which means that you can distribute, remix, and build upon the content, as long as you provide
attribution to OpenStax and its content contributors.

Because our books are openly licensed, you are free to use the entire book or select only the sections that are
most relevant to the needs of your course. Feel free to remix the content by assigning your students certain
chapters and sections in your syllabus, in the order that you prefer. You can even provide a direct link in your
syllabus to the sections in the web view of your book.

Instructors also have the option of creating a customized version of their OpenStax book. Visit the Instructor
Resources section of your book page on OpenStax.org for more information.

Art Attribution
In Introduction to Computer Science, art contains attribution to its title, creator or rights holder, host platform,
and license within the caption. Because the art is openly licensed, anyone may reuse the art as long as they
provide the same attribution to its original source.

Errata
All OpenStax textbooks undergo a rigorous review process. However, like any professional-grade textbook,
errors sometimes occur. In addition, the wide range of evidence, standards, practices, data, and legal
circumstances in computer science change frequently, and portions of the text may become out of date. Since
our books are web-based, we can make updates periodically when deemed pedagogically necessary. If you
have a correction to suggest, submit it through the link on your book page on OpenStax.org. Subject matter
experts review all errata suggestions. OpenStax is committed to remaining transparent about all updates, so
you will also find a list of past and pending errata changes on your book page on OpenStax.org.

Format
You can access this textbook for free in web view or PDF through OpenStax.org, and for a low cost in print. The
web view is the recommended format because it is the most accessible—including being WCAG 2.2 AA
compliant – and most current. Print versions are available for individual purchase, or they may be ordered
through your campus bookstore.

About Introduction to Computer Science


Introduction to Computer Science provides a comprehensive foundation in core computer science concepts
and principles, aligning with the scope and sequence of most introductory computer science courses. The
textbook serves as an engaging entry point for students pursuing diverse fields of study and employment,
including computer science, business, engineering, data science, social sciences, and related disciplines. By
addressing a broad learner audience—ranging from computer science majors to non-majors—the book offers
a thorough introduction to computational thinking and its applications across multiple domains.
2 Preface

Introduction to Computer Science is designed to be both interactive and practical, focusing on real-world
applications that showcase how core computer science concepts can be used to solve complex problems.
Students will explore foundational topics, such as algorithms, data structures, computer systems organization,
and software development, using an array of engaging, hands-on activities. The textbook integrates
meaningful learning experiences through chapter-based scenarios, problem-solving exercises, and project-
based assessments that encourage students to apply what they learn in authentic contexts.

Features such as embedded coding exercises, industry insights, and explorations of emerging technology
trends provide a holistic approach to learning that extends beyond theory. With a forward-looking perspective,
Introduction to Computer Science prepares students to engage with advanced topics in computer science,
such as machine learning, cybersecurity, and cloud computing, ensuring they have a solid foundation for
continued study and future professional success.

Coverage and Scope


The authors and contributors consulted with other educators and industry professionals from a range of
institutions and organizations in order to ensure that Introduction to Computer Science meets the diverse
needs of both computer science majors and non-majors. The book is structured into five main parts, each
focusing on critical areas of the discipline:

• Part 1: Problem Solving and Algorithms This section introduces students to the foundations of
computer science, focusing on computational thinking, problem-solving techniques, and algorithm
design. Topics include data structures, formal properties of algorithms, and algorithmic paradigms.
Through practical examples and exercises, students will develop the skills needed to construct and analyze
algorithms and understand their applications across various domains.
• Part 2: Realizations of Algorithms In this part, students explore how algorithms are realized in hardware
and software, starting with low-level programming languages and moving into hardware design and
computer systems organization. Students will learn about models of computation, machine-level
representation, processor architectures, and memory hierarchy. This foundational understanding enables
students to see the connections between abstract algorithms and their physical implementations.
• Part 3: Designing and Developing Software Solutions This section covers the principles of software
development, high-level programming languages, and data management. Students will learn the
fundamentals of software engineering and gain hands-on experience with both relational and non-
relational database systems. Emphasis is placed on designing robust software solutions and managing
complex data structures, ensuring students are well-prepared for future roles in software development
and engineering.
• Part 4: Building Modern End-to-End Solutions to Business and Social Problems Students apply their
knowledge to design and build web and cloud-native applications. This part includes examples of modern
web architectures, responsive design techniques, and cloud-based solutions using PaaS and FaaS
technologies. Additionally, students will explore the development of hybrid multi-cloud digital solutions,
providing them with experience in addressing complex business and social challenges using modern
computing technologies.
• Part 5: Human-Centered Responsible Computing The final section delves into the ethical and societal
implications of computing. Topics include cybersecurity, governance of cyber resources, and responsible
computing practices. Students will learn to navigate the complexities of cybersecurity and governance
while considering the broader impacts of technology on society.

Each core concept is designed to build on the previous one, ensuring a coherent learning experience that
provides students with a clear view of the field. The book’s approach enables students to not only understand
the principles of computer science but also see how they can be applied to address practical, real-world
problems.

Access for free at openstax.org


Preface 3

Pedagogical Foundation and Features


The Introduction to Computer Science textbook is designed to engage students through a combination of
practical, real-world applications and thought-provoking scenarios that promote critical thinking and a deeper
understanding of core concepts. The pedagogical approach is centered on making computer science relevant
and accessible for students from diverse backgrounds, whether they are pursuing a degree in computer
science or exploring how computational thinking can be applied to their respective fields. To support this
vision, the textbook incorporates several key features:

• Concepts in Practice features present how computer science concepts are applied in real-world contexts
by both professionals and non-professionals. Each box profiles personas and practical applications that
demonstrate how core topics, such as algorithms, data management, and software engineering, are
utilized across various industries. The purpose is to inspire students—particularly non-majors—by
showing them the value of computer science in solving everyday challenges and to foster a greater
appreciation for the discipline.
• Global Issues in Technology features help students think globally about the societal impact of
technology. These boxes highlight how technology affects communities and economies around the world
and may introduce topics such as digital equity, environmental sustainability, and global data security.
Students are encouraged to consider the broader implications of technological advancements and to think
critically about their potential to drive positive change or create new challenges in global contexts.
• Industry Spotlight boxes focus on specific industry challenges and how technology progress or
application can help solve them. Industry Spotlight features introduce students to various sectors—such
as healthcare, finance, education, and law—providing a glimpse into how computer science can drive
innovation and efficiency. By connecting theoretical concepts to industry-specific problems, these features
encourage students to explore the wide-ranging applications of computer science and understand its
value across different fields.
• Link to Learning features provide a very brief introduction to online resources—videos, interactives,
collections, maps, and other engaging resources that are pertinent to students’ exploration of the topic at
hand.
• Technology in Everyday Life features connect computer science principles to students’ personal
experiences and the world around them. These boxes explore how technology intersects with daily life or
current events, making computer science concepts more relatable and relevant. Some features may
prompt students to think creatively and propose their own ideas for applying computer science solutions
to familiar scenarios.
• Think It Through scenarios present students with thought-provoking dilemmas or complex problems
related to the use of technology. Students are asked to reflect on ethical questions, problem-solving
strategies, and real-world decision-making processes. These features emphasize that not all problems
have straightforward answers and encourage students to weigh the pros and cons of different
approaches. By navigating these scenarios, students learn to develop judgment skills that are crucial in
business and technology environments.

Overall, these features are integrated throughout the material to foster active learning, critical thinking, and
an appreciation for the practical applications of computer science. By connecting theory to practice and
encouraging students to explore real-world issues, Introduction to Computer Science provides a meaningful
and supportive learning experience that equips students with the knowledge and skills necessary for success
in their academic and professional journeys.

Answers to Questions in the Book

The end-of-chapter Review, Conceptual Questions, Practice Exercises, Problem Sets, Thought Provokers, and
Labs are intended for homework assignments or classroom discussion; thus, student-facing answers are not
provided in the book. Answers and sample answers are provided in the Instructor Answer Guide, for
4 Preface

instructors to share with students at their discretion, as is standard for such resources.

About the Author


Senior Contributing Author

Senior contributing author: Dr. Jean-Claude Franchitti

Dr. Jean-Claude Franchitti is a Clinical Associate Professor of Computer Science and the Associate Director of
Graduate Studies for the CS Master’s program in Information Systems at NYU Courant Institute. He earned his
M.S. in Electrical Engineering (1985) and his M.S. and PhD. in Computer Science from the University of
Colorado at Boulder (1988, 1993). He is the founder and CEO of Archemy, Inc., and has over 40 years of
experience in a myriad of industries, and over 30 years of teaching and corporate training experience. He held
executive positions in large US-based corporations and leading business technology consulting firms such as
Computer Sciences Corporation. He has been involved in many large business technology strategy and
modernization projects and has a proven record of delivering large scale business solutions. He was the
original designer and developer of jcrew.com and the suite of products now known as IBM InfoSphere
DataStage. He also created the Agile Enterprise Architecture Management (AEAM) methodology and a
corresponding framework that are patented components of the Archemy business evolution platform. He has
developed partnerships with many companies at New York University to incubate new methodologies,
transition research into business solutions, help recruit skilled graduates, and increase the companies’
visibility. Dr. Franchitti has been a reviewer member on several industry standards committees including OMG,
ODMG, and X3H2. Dr. Franchitti taught at CU-Boulder, Denver University, Columbia University, NYU SCPS,
before joining NYU Courant Institute in 1997. He has extensive experience with corporate training and
developed and delivered training and mentoring programs for the top corporate education providers. He
conducted research as part of several NSF- and DARPA-funded research programs.

Dr. Franchitti’s teaching and research interests include machine learning, artificial intelligence, data
management systems and software engineering with an emphasis on large-scale software architectures and
business solutions. He has published articles in numerous refereed publications including the Proceedings of
Third Int. Conf. on Cooperative Information Systems, Proceedings of the Sixth International Workshop on
Persistent Object Systems, and the 16th International Conference on Software Engineering. Dr. Franchitti
received an award for Outstanding Service from NYU’s School of Continuing and Professional Studies.

Contributing Authors
Amal Alhosban, University of Michigan–Flint

Mark Buckler, Grand Canyon University

Joanna Gilberti, Archemy, Inc.

Scott Gray, Nashua Community College

Matthew Hertz, University at Buffalo

Andrew Hurd, Empire State University

Kevin Lin, University of Washington

Access for free at openstax.org


Preface 5

Sai Mukkavilli, Georgia Southwestern State University

Phuc (Brian) Nguyen, UC Irvine

Shahab Tayeb, Fresno State

Zdeněk Troníček, Tarleton State University

Kevin Wortman, Cal State Fullerton

Mohamed Zahran, New York University

Reviewers
Reni Abraham, Houston Community College

Shakil Akhtar, Clayton State University

Kiavash Bahreini, Florida International University

Tammie Bolling, Pellissippi State Community College

Phillip Bradford, UConn

Quiana Bradshaw, Campbellsville University

Christopher Bunton, Austin Community College

Komal Chhibber, South Mountain Community College

Chen-Fu Chiang, SUNY Polytechnic Institute

Gabriel de la Cruz, North Idaho College

Gabriel Ferrer, Hendrix College

David Fogarty, New York University

Yuexing Hao, Massachusetts Institute of Technology

Nazli Hardy, Millersville University

Angela Heath, Baptist Health System

Rania Hodhod, Columbus State University

Benita Hubbard, Southern New Hampshire University

Sumit Jha, Florida International University

Mohamed E. Khalefa, SUNY Old Westbury

Steven Ko, Simon Fraser University

Jessica Kong, University of Washington

Alex Krasnok, Florida International University

Blaise Liffick, Millersville University

Sue McCrory, Missouri State University

Morgan McKie, Florida International University

Sandeep Mitra, SUNY Brockport

Mourya Narasareddygari, Rider University


6 Preface

Saty Raghavachary, University of Southern California

Muhammad Rahman, Clayton State University

Amir Rahmati, Stony Brook University

Caryl Rahn, Florida International University

Jerry Reed, Valencia College

Jordan Ringenberg, University of Findlay

Eman Saleh, University of Georgia

Vincent Sanchez, Florida International University

John Schriner, Queensborough Community College

Tiffanie R. Smith, Lincoln University

Hann So, De Anza College

Jayesh Soni, Florida International University

Derrick Stevens, Mohawk Valley Community College

Kathleen Tamerlano, Cuyahoga Community College

Chintan Thakkar, Rasmussen University

Jingnan Xie, Millersville University

Ning Xie, Florida International University

Additional Resources
Student and Instructor Resources
We have compiled additional resources for both students and instructors, including Getting Started Guides, an
instructor’s answer guide, test bank, and image slides. Instructor resources require a verified instructor
account, which you can apply for when you log in or create your account on OpenStax.org. Take advantage of
these resources to supplement your OpenStax book.

Instructor’s answer guide. Each component of the instructor’s guide is designed to provide maximum
guidance for delivering the content in an interesting and dynamic manner.

Test bank. With hundreds of assessment items, instructors can customize tests to support a variety of course
objectives. The test bank includes review questions (multiple-choice, identification, fill-in-the-blank, true/false),
short answer questions, and long answer questions to assess students on a variety of levels. The test bank is
available in Word format.

PowerPoint lecture slides. The PowerPoint slides provide learning objectives, images and descriptions,
feature focuses, and discussion questions as a starting place for instructors to build their lectures.

Academic Integrity
Academic integrity builds trust, understanding, equity, and genuine learning. While students may encounter
significant challenges in their courses and their lives, doing their own work and maintaining a high degree of
authenticity will result in meaningful outcomes that will extend far beyond their college career. Faculty,
administrators, resource providers, and students should work together to maintain a fair and positive
experience.

We realize that students benefit when academic integrity ground rules are established early in the course. To

Access for free at openstax.org


Preface 7

that end, OpenStax has created an interactive to aid with academic integrity discussions in your course.

Visit our academic integrity slider (https://view.genial.ly/61e08a7af6db870d591078c1/interactive-image-defining-academic-integrity-


interactive-slider). Click and drag icons along the continuum to align these practices with your institution and course policies. You
may then include the graphic on your syllabus, present it in your first course meeting, or create a handout for students. (attribution:
Copyright Rice University, OpenStax, under CC BY 4.0 license)

At OpenStax we are also developing resources supporting authentic learning experiences and assessment.
Please visit this book’s page for updates. For an in-depth review of academic integrity strategies, we highly
recommend visiting the International Center of Academic Integrity (ICAI) website at
https://academicintegrity.org/ (https://academicintegrity.org/).

Community Hubs
OpenStax partners with the Institute for the Study of Knowledge Management in Education (ISKME) to offer
Community Hubs on OER Commons—a platform for instructors to share community-created resources that
support OpenStax books, free of charge. Through our Community Hubs, instructors can upload their own
materials or download resources to use in their own courses, including additional ancillaries, teaching
material, multimedia, and relevant course content. We encourage instructors to join the hubs for the subjects
most relevant to your teaching and research as an opportunity both to enrich your courses and to engage with
other faculty. To reach the Community Hubs, visit www.oercommons.org/hubs/openstax.

Technology partners
As allies in making high-quality learning materials accessible, our technology partners offer optional low-cost
tools that are integrated with OpenStax books. To access the technology options for your text, visit your book
page on OpenStax.org.
8 Preface

Access for free at openstax.org


1
Introduction to Computer Science

Figure 1.1 Computing is everywhere, affecting everyone, for better and for worse. (credit: modification of "Whereas design is
expansive, engineering is narrowing" by Jessie Huynh/Critically Conscious Computing, CC0)

Chapter Outline
1.1 Computer Science
1.2 Computer Science across the Disciplines
1.3 Computer Science and the Future of Society

Introduction
This textbook will introduce you to the exciting and complex world of computer science. In this chapter, you’ll
review the history of computer science, learn about its use in different fields, and explore how computer
science will impact the future of society. Computer science is a powerful tool, and computer scientists have
used their vast knowledge of technology to create and implement technology that has transformed societies
around the world.

This book will also introduce the computational thinking aspects of problem-solving and analytical thinking
that enable the study of algorithms, which are step-by-step instructions for solving specific problems or
carrying out computations. Therefore, this book also covers algorithms and their realization via programming
languages, computer systems architectures, networks, and operating systems. The book subsequently delves
into computer science areas that enable the design and development of software solutions using high-level
programming languages (i.e., coding languages designed to be more intuitive for humans), architectural styles
and related models, data management systems, and software engineering. Finally, the book demonstrates
how to leverage computer science realizations and areas to build modern end-to-end solutions to business
and social problems. In particular, the book focuses on modern web applications development, cloud-native
applications development, and hybrid Cloud/on-premise digital solutions. The various chapters emphasize
how to achieve software solution qualities such as performance and scalability. The last chapter explains how
to secure software applications and their applications in the context of various cyber threats. It also explains
how to make the right decisions about using computers and information in society to navigate social, ethical,
economic, and political issues that could result from the misuse of technology. To conclude this textbook, we’ll
introduce you to cybersecurity and help you understand why responsible computing is essential to promote
10 1 • Introduction to Computer Science

ethical behavior in computer science. The book is designed to help students grasp the full meaning of
computer science as a tool that can help them think, build meaningful solutions to complex problems, and
motivate their careers in information technology (IT).

You’re already familiar with computer science. Whenever you use a laptop, tablet, cell phone, credit card
reader, and other technology, you interact with items made possible by computer science. Computer science is
a challenging field, and the outputs of computer science offer many benefits for society. At the same time, we
have to be cautious about how we use computer science to ensure it impacts society in ethical ways. To help
you understand this, the next section will explain how computer science came to be and discuss the field’s
potential.

1.1 Computer Science


Learning Objectives
By the end of this section, you will be able to:
• Discuss the history that led to the creation of computer science as a field
• Define computer science
• Assess what computer science can do, as well as what it should not do

The field of computer science (CS) is the study of computing, which includes all phenomena related to
computers, such as the Internet. With foundations in engineering and mathematics, computer science focuses
on studying algorithms. An algorithm is a sequence of precise instructions that enables computing. This
includes components computers use to process information. By studying and applying algorithms, computer
science creates applications and solutions that impact all areas of society. For example, computer science
developed the programs that enable online shopping, texting with friends, streaming music, and other
technological processes.

While computers are common today, they weren’t always this pervasive. For those whose lives have been
shaped by computer technology, it can sometimes seem like computer technology is ahistorical: computing
often focuses on rapid innovation and improvement, wasting no time looking back and reflecting on the past.
Yet the foundations of computer science defined over 50, and as much as 100, years ago very much shape
what is possible with computing today.

The Early History of Computing


The first computing devices were not at all like the computers we know today. They were physical calculation
devices such as the abacus, which first appeared in many societies across the world thousands of years ago.
They allowed people to tally, count, or add numbers (Figure 1.2). Today, abaci are still used in some situations,
such as helping small children learn basic arithmetic, keeping score in games, and as a calculating tool for
people with visual impairments. However, abaci are not common today because of the invention of number
systems such as the Arabic number system (0, 1, 2, 3, . . .), which included zero and place values that cannot be
computed with abaci. The concept of an algorithm was also invented around this time. Algorithms use inputs
and a finite number of steps to carry out arithmetic operations like addition, subtraction, multiplication, and
division, and produce outputs used in computing. Today’s computers still rely on the same foundations of
numbers, calculations, and algorithms, except at the scale of billions of numbers and billions of calculations
per second.

To introduce a concrete example of an algorithm, let us consider binary search algorithm, which is used to
locate a number in a sorted array of integers efficiently. The algorithm operates by repeatedly dividing the
search interval in half to perform the search. If the number being searched is less than the integer in the
middle of the interval, the interval is narrowed to the lower half. In the alternative, the interval is narrowed to
the upper half. The algorithm repeatedly checks until the number is found or the interval is empty.

Access for free at openstax.org


1.1 • Computer Science 11

Algorithms may sound complicated, but they can be quite simple. For example, recipes to prepare food are
algorithms with precise directions for ingredient amounts, the process to combine these, and the
temperatures and cooking methods needed to transform the combined ingredients into a specific dish. The
dish is the output produced by following the algorithm of a recipe.

Figure 1.2 An abacus is one of the first calculators. (credit: “Traditional Chinese abacus illustrating the suspended bead use” by
Jccsvq/Wikimedia Commons, CC0)

The next major development in the evolution of computing occurred in 1614 when John Napier, a Scottish
mathematician, developed logarithms, which express exponents by denoting the power that a number must
be raised to obtain another value. Logarithms provided a shortcut for making tedious calculations and became
the foundation for multiple analog calculating machines invented during the 1600s.

Scientists continued to explore different ways to speed up or automate calculations. In the 1820s, English
mathematician Charles Babbage invented the Difference Engine with the goal of preventing human errors in
manual calculations. The Difference Engine provided a means to automate the calculations of polynomial
functions and astronomical calculations.

Babbage followed the Difference Engine with his invention of the Analytical Engine. With assistance from Ada
Lovelace, the Analytical Engine was program-controlled and included features like an integrated memory and
an arithmetic logic unit. Lovelace used punched cards to create sequencing instructions that could be read by
the Analytical Engine to automatically perform any calculation included in the programming code. With her
work on the Analytical Engine, Lovelace became the world’s first computer programmer.

The next major development in computing occurred in the late 1800s when Herman Hollerith, an employee of
the U.S. Census Office, developed a machine that could punch cards and count them. In 1890, Hollerith’s
invention was used to tabulate and prepare statistics for the U.S. census.

By the end of the 1800s and leading into the early 1900s, calculators, adding machines, typewriters, and
related machines became more commonplace, setting the stage for the invention of the computer. In the
1940s, multiple computers became available, including IBM’s Harvard Mark 1. These were the forerunners to
the advent of the digital computer in the 1950s, which changed everything and evolved into the computers
and related technology we have today.

Around this time, computer science emerged as an academic discipline rooted in the principles of
mathematics, situated primarily in elite institutions, and funded by demand from the military for use in missile
guidance systems, airplanes, and other military applications. As computers could execute programs faster
than humans, computer science replaced human-powered calculation with computer-powered problem-
solving methods. In this way, the earliest academic computer scientists envisioned computer science as a
discipline that was far more intellectual and cognitive compared to the manual calculation work that preceded
it.
12 1 • Introduction to Computer Science

Richard Bellman was a significant contributor to this effort. A mathematics professor at Princeton and later at
Stanford in the 1940s, Bellman later went to work for the Rand Corporation, where he studied the theory of
1
multistage decision processes. In 1953, Bellman invented dynamic programming, which is a mathematical
optimization methodology and a technique for computer programming. With dynamic programming, complex
problems are divided into more manageable subproblems. Each subproblem is solved, and the results are
2
stored, ultimately resulting in a solution to the overall complex problem. With this approach, Bellman helped
revolutionize computer programming and enable computer science to become a robust field.

What Is Computer Science?


The term computer science was popularized by George E. Forsythe in 1961. A mathematician who founded
Stanford University’s computer science department, Forsythe defined computer science as “the theory of
programming, numerical analysis, data processing, and the design of computer systems.” He also argued that
computer science was distinguished from other disciplines by the emphasis on algorithms, which are essential
3
for effective computer programming.

Computer science is not only about the study of how computers work, but also everything surrounding
computers, including the people who design computers, the people who write programs that run on
computers, the people who test the programs to ensure correctness, and the people who are directly and
indirectly affected by computers. In this way, computer science is as much about people and how they work
with computers as it is about just computers.

Not everyone agrees with this definition. Some people argue that computer science is more about computers
or software than the people it affects. However, even if we were to study just the “things” of computer science,
the people are still there. When someone designs a computer system, they are thinking about what kinds of
programs people might want to run. Typically, effort is made to design the computer system so it is more
efficient at running certain kinds of programs. A computer optimized for calculating missile trajectories, for
example, won’t be optimized for running social media apps.

Many computing innovations were initially developed for military research and communication purposes,
including the predecessor to the Internet, the ARPANET (Figure 1.3).

1 S. Golomb, “Richard E. Bellman 1920–1984,” n.d. https://www.nae.edu/189177/RICHARD-E-BELLMAN-19201984


2 Geeks for Geeks, “Dynamic Programming or DP,” 2024. https://www.geeksforgeeks.org/dynamic-programming/
3 D. E. Knuth, “George Forsythe and the Development of Computer Science,” Communications of the ACM, vol. 15, no.8, pp.
722–723. 1972. https://dl.acm.org/doi/pdf/10.1145/361532.361538

Access for free at openstax.org


1.1 • Computer Science 13

Figure 1.3 The ARPANET, circa 1974, was an early predecessor to the Internet. It allowed computers at Pentagon-funded research
facilities to communicate over phone lines. (credit: modification of "Arpanet 1974" by Yngvar/Wikipedia, Public Domain)

What Is a Computer?
While computer science is about much more than just computers, it helps to know a bit more about computers
because they are an important component of computer science. All computers are made of physical, real-
world material that we refer to as hardware. Hardware—which has four components, including processor,
memory, network, and storage—is the computer component that enables computations. The processor can
be regarded as the computer’s “brain,” as it follows instructions from algorithms and processes data. The
memory is a means of addressing information in a computer by storing it in consistent locations, while the
network refers to the various technological devices that are connected and share information. The hardware
and physical components of a computer that permanently house a computer’s data are called storage.

One way to understand computers is from a hardware perspective: computers leverage digital electronics and
the physics of materials used to develop transistors. For example, many of today’s computers rely on the
physical properties of a brittle, crystalline metalloid called silicon, which makes it suitable for representing
information. The batteries that power many of today’s smartphones and mobile devices rely on lithium, a soft,
silvery metal mostly harvested from minerals in Australia, Zimbabwe, and Brazil, as well as from continental
brine deposits in Chile, Argentina, and Bolivia. Computer engineers combine these substances to build
circuitry and information pathways at the microscopic scale to form the physical basis for modern computers.

However, the physical basis of computers was not always silicon. The Electronic Numerical Integrator and
Computer (ENIAC) was completed in 1945, making it one of the earliest digital computers. The ENIAC operated
on different physical principles. Instead of silicon, the ENIAC used the technology of a vacuum tube, a physical
device like a light bulb that was used as memory in early digital computers. When the “light” in the vacuum
tube is off, the vacuum tube represents the number 0. When the “light” is on, the vacuum tube represents the
number 1. When thousands of vacuum tubes are combined in a logical way, we suddenly have memory. The
ENIAC is notable in computer history because it was the first general-purpose computer, meaning that it could
run not just a single program but rather any program specified by a programmer. The ENIAC was often run
and programmed by women programmers (Figure 1.4). Despite its age and differences in hardware properties,
it shares a fundamental and surprising similarity with modern computers. Anything that can be computed on
14 1 • Introduction to Computer Science

today’s computers can also be computed by the ENIAC given the right circumstances—just trillions of times
more slowly.

Figure 1.4 This image depicts women programmers holding boards used in computers such as the ENIAC, many of which were
designed expressly for ballistics and ordinance guidance research. Today, these room-size computers can be reproduced at a literally
microscopic scale—basically invisible to the human eye. (credit: modification of "Women holding parts of the first four Army
computers" by U.S. Army/Wikimedia Commons, Public Domain)

How is this possible? The algorithmic principles that determine how results are computed makes up software.
Almost all computers, from the ENIAC to today’s computers, are considered Turing-complete (or
Computationally Universal, as opposed to specialized computing devices such as scientific calculators) because
they share the same fundamental model for computing results and every computer has the ability to run any
algorithm. Alan Mathison Turing was an English mathematician who was highly influential in the development
of theoretical computer science, which focuses on the mathematical processes behind software, and provided
a formalization of the concepts of algorithm and computation with the Turing machine. A Turing-complete
computer stores data in memory (either using vacuum tubes or silicon) and manipulates that data according
to a computer program, which is an algorithm that can be run on a computer. These programs are
represented using symbols and instructions written in a programming language consisting of symbols and
instructions that can be interpreted by the computer. Programs are also stored in memory, which allows
programmers to modify and improve programs by changing the instructions.

While both hardware and software are important to the practical operation of computers, computer science’s
historical roots in mathematics also emphasize a third perspective. Whereas software focuses on the program
details for solving problems with computers, theoretical computer science focuses on the mathematical
processes behind software. The idea of Turing-completeness is a foundational concept in theoretical computer
science, which considers how computers in general—not just the ENIAC or today’s computers, but even
tomorrow’s computers that we haven’t yet invented—can solve problems. This theoretical perspective expands
computer science knowledge by contributing ideas about (1) whether a problem can be computed by a Turing-
complete computer at all, (2) how that problem might be computed using an algorithm, and (3) how quickly or
efficiently a computer can run such an algorithm. The answers to these questions suggest the limits of what
we can achieve with computers from a technical perspective: Using mathematical ideas, is it possible to use a
computer to compute all problems? If the answer to a problem is yes, how much of a computing resource is
needed to get the answer?

Clearly both humans and computers have their strengths and limitations. An example of a problem that
humans can solve but computers struggle with is interpreting subtle emotions or making moral judgments in
complex social situations. While computers can process data and recognize patterns, they cannot fully
understand the nuances of human emotions or ethics, which often involve context, empathy, and experience.

Access for free at openstax.org


1.1 • Computer Science 15

On the flip side, there are tasks that neither computers nor humans can perform, such as accurately predicting
chaotic systems like long-term weather patterns. Despite advancements in artificial intelligence (AI),
computer functions that perform tasks, such as visual perception and decision-making processes that usually
are performed by human intelligence, these problems remain beyond our collective reach due to the inherent
unpredictability and complexity of certain natural systems.

Theoretical computer science is often emphasized in undergraduate computer science programs because
academic computer science emerged from mathematics, often to the detriment of perspectives that center on
the social and technical values embodied by applications of computer technology. These perspectives,
however, are gradually changing. Just as the design of ARPANET shaped the design of the Internet, computer
scientists are also learning that the physical aspects of computer hardware determine what can be efficiently
computed. For example, many of today’s artificial intelligence technologies rely on highly specialized computer
hardware that is fundamentally different at the physical level compared to the general-purpose programmable
silicon that has been the traditional focus of computer science. Organizations that develop human-computer
interaction (HCI), a subfield of computer science that emphasizes the social aspects of computation, now host
annual conferences that bring together thousands of researchers in academia and professionals in the
industry. Computer science education is another subfield that emphasizes the cognitive, social, and communal
aspects of learning computer science. Although these human-centered subfields are not yet in every computer
science department, their increasing representation reflects computer scientists’ growing desire to serve not
only more engaged students, but also a more engaged public in making sense of the values of computer
technologies.

The Capabilities and Limitations of Computer Science


Computers can be understood as sources, tools, and opportunities for changing social conditions. Many
people have used computer science to achieve diverse goals beyond this dominant vision for computer
science. For example, consider computers in education.

Around the same time that the ARPANET began development in the late 1960s, Wally Feurzeig, Seymour
Papert, and Cynthia Solomon designed the LOGO programming language to enable new kinds of computer-
mediated expression and communication. Compared to contemporary programming languages such as
FORTRAN (FORmula TRANslation System) that emphasized computation toward scientific and engineering
applications, LOGO is well known for its use of turtle graphics, whereby programs were used to control the
actions of a digital turtle using instructions such as moving forward some number of units and turning left or
right some number of degrees. Papert argued that this turtle programming enabled body-syntonic reasoning,
a kind of experience that could help students more effectively learn concepts in mathematics such as angles,
distance, and geometric shapes by instructing the turtle to draw them, and physics by constructing their own
understandings via reasoning through the physical motion of turtle programs by showing concepts of velocity,
repeated commands to move forward the same amount; acceleration, by making the turtle move forward in
increasing amounts; and even friction, by having the turtle slow down by moving forward by decreasing
amounts. In this way, computers could not only be used to further education in computer science, but also
offer new, more dynamic ways to learn other subjects. Papert’s ideas have been expanded beyond the realm of
mathematics and physics to areas such as the social sciences, where interactive data visualization can help
students identify interesting correlations and patterns that precipitated social change and turning points in
4
history while also learning new data fluencies and the limits of data-based approaches.

Yet despite these roots in aspirations for computers as a medium for learning anything and everything, the
study of computer science education emerged in the 1970s as a field narrowly concerned with producing more
effective software engineers. Higher-education computer science faculty, motivated by the demand for

4 B. Naimipour, M. Guzdial, and T. Shreiner. 2019. Helping Social Studies Teachers to Design Learning Experiences Around Data:
Participatory Design for New Teacher-Centric Programming Languages. In Proceedings of the 2019 ACM Conference on International
Computing Education Research (ICER '19). Association for Computing Machinery, New York, NY, USA, 313. DOI: https://doi.org/
10.1145/3291279.3341211
16 1 • Introduction to Computer Science

software engineers, designed their computer science curricula to teach the concepts that early computer
companies such as IBM desperately needed. These courses had an emphasis on efficiency, performance, and
scalability, because a university computer science education was only intended to produce software engineers.
We live with the consequences of this design even today: the structure of this textbook inherits the borders
between concepts originally imagined in the 1970s when university computer science education was only
intended to prepare students for software development jobs. We now know that there are many more roles for
computer scientists to play in society—not only software engineers, but also data analysts, product managers,
entrepreneurs, political advisors or politicians, environmental engineers, social activists, and scientists across
every field from accounting to zoology.

Although the role of computers expanded with the introduction of the Internet in the late 1990s, Papert’s
vision for computation as a learning medium has been challenging to implement, at least partly because of
funding constraints. But as computers evolve, primary and secondary education in the United States is striving
for ways to help teachers use computers to more effectively teach all things—not just computers for their own
sake, but using computers to learn everything.

Computers and Racial Justice


Our histories so far have centered the interests of White American men in computer science. But there are also
countless untold, marginalized histories of people of other backgrounds, races, ethnicities, and genders in
computing. The book and movie Hidden Figures shares the stories of important Black women who were not
only human computers, but also some of the first computer scientists for the early digital computers that
powered human spaceflight at NASA (Figure 1.5).

Figure 1.5 Katherine Johnson, a Black computer scientist, recalculated the computations done by early digital computers for space
flight planning at NASA. Her contributions were portrayed in the book and movie Hidden Figures. (credit: “Katherine Johnson at
NASA, in 1966” by NASA/Wikimedia Commons, Public Domain)

Access for free at openstax.org


1.1 • Computer Science 17

In one chapter of Black Software, Charlton McIlwain shares stories from “The Vanguard” of Black men and
women who made a mark on computer science in its early years from the 1950s through the 1990s through
the rise of personal computing and the Internet, but whose histories have largely been erased by the
dominant Silicon Valley narratives. Their accomplishments include leading computer stores and developing
early Internet social media platforms, news, and blog websites. For example, Roy L. Clay Sr., a member of the
Silicon Valley Engineering Hall of Fame, helped Hewlett-Packard develop its first computer lab and create the
company’s first computers. Later, Clay provided information to venture capitalists that motivated them to
5
invest in start-ups such as Intel and Compaq. In another example, Mark Dean was an engineer for IBM whose
work was instrumental in helping IBM develop the Industry Standard Architecture (ISA) bus, which created a
method of connecting a computer’s processor with other components and enabling them to communicate.
6
This led to the creation of PCs, with Dean owning three of the nine patents used to create the original PC.

Yet their efforts were often hampered by the way that computer science failed to center, or even
accommodate, Black people. Historically, American Indians and Hispanic people did not have the same access
as even Black Americans to computers and higher education. Kamal Al-Mansour, a technical contract
negotiator at the NASA Jet Propulsion Lab, worked on space projects while Ronald Reagan was president. He
recounts:

“It was conflicting . . . doing a gig . . . supporting missiles in the sky, (while) trying to find my own identity and
culture . . . JPL was somewhat hostile . . . and I would come home each day [thinking] What did I accomplish
7
that benefited people like me? And the answer every day would be ‘Nothing.’”

Al-Mansour would go on to start a new company, AfroLink, finding purpose in creating software that centered
on Black and African history and culture. This story of computer technologies in service of African American
communities is reflected in the creation of the Afronet (an early social media for connecting Black
technologists) and the NetNoir (a website that sought to popularize Black culture). These examples serve as
early indicators of the ways that Black technologists invented computer technologies for Black people in the
United States. Yet Black Software also raises challenging political implications of the historical exclusion of
Black technologists. Black culture on the Internet has greatly influenced mainstream media and culture in the
United States, but these Black cultural products are ultimately driving attention and money to dominant
platforms such as X and TikTok rather than those that directly benefit Black people, content creators, and
entrepreneurs. Computer technologies risk reproducing social inequities through the ways in which they
distribute benefits and harms.

The digital divide has emerged as a significant issue, as many aspects of society -- including education,
employment, and social mobility -- become tied to computing, computer science, and connectivity. The divide
refers to the uneven and unequal access and distribution of technology across populations from different
geographies, socioeconomic statuses, races, ethnicities, and other differentiators. While technological access
generally improves over time, communities within the United States and around the world have different levels
of access to high-speed Internet, cell towers, and functioning school computers. Unreliable electricity can also
play a significant role in computer and Internet usage. And beyond systemic infrastructure-based differences,
individual product or service access can create a divide within communities. For example, if powerful AI-based
search and optimization tools are only accessible through high-priced subscriptions, specific populations can
be limited in benefiting from those tools.

5 J. Dreyfuss, “Blacks in Silicon Valley,” 2011. https://www.theroot.com/blacks-in-silicon-valley-1790868140


6 IBMers, “Mark Dean,” n.d. https://www.ibm.com/history/mark-dean
7 C. D. McIlwain, (2019). Black software: The Internet and racial justice, from the AfroNet to Black Lives Matter, New York: Oxford
University Press.
18 1 • Introduction to Computer Science

GLOBAL ISSUES IN TECHNOLOGY

H-1B Visas Address Worker Shortages


According to the U.S. Bureau of Labor Statistics (BLS), by 2033, the number of jobs available for computer
and information research scientists is expected to increase by 26%. This is much faster job growth than the
average expected in total for all occupations. BLS predicts that this will result in about 3,400 job openings
8
per year in technology, including computer science.

To fill some of these jobs, U.S. employers likely will continue to rely on H-1B visas. This visa enables
employers to recruit well-educated professionals from other countries. These professionals temporarily
reside in the United States and work in specialty occupations, like computer science, that require a
9
minimum education of a bachelor’s degree or its equivalent. To participate in the visa program, employers
must register and file a petition to hire H-1B visa holders. Each year, the U.S. Citizenship and Immigration
Services accepts applications from individuals from other countries who compete for a pool of 65,000 visa
numbers, as well as an additional pool of 20,000 master’s exemption visa numbers awarded that year and
valid for a period of three years. At the end of three years, employers can petition to have each worker’s
10
visa extended for a period of three additional years. This program helps U.S. employers fill vacancies in
many fields, including computer science while providing job opportunities for highly skilled workers around
the world.

Computers and Global Development


Computer technology, like any other cutting-edge technology, changes the balance of power in society. But
access to new technologies is rarely ever equal. Computer science has improved the quality of life for many
people who have access to computer technology and the means of controlling it to serve their interests. But
for everyone else in the world, particularly people living in the Global South, computer technologies need
context-sensitive designs to meet their needs. In the 1990s, for instance, consumer access to the Internet was
primarily based on “dial-up” systems that ran on top of public telephone network systems. Yet many parts of
the world, even today, lack telephone coverage, let alone Internet connectivity. Research in computers for
global development aims to improve the quality of life for people all over the world by designing computer
solutions for low-income and underserved populations across the world—not just those living in the wealthiest
countries.

Computer technologies for global development require designing around unique resource constraints such as
a lack of reliable power, limited or nonexistent Internet connectivity, and low literacy. Computer scientists
employ a variety of methods drawing from the social sciences to produce effective solutions. However,
designing for diverse communities is difficult, particularly when the designers have little direct experience with
the people they wish to serve. In The Charisma Machine, Morgan Ames criticizes the One Laptop Per Child
(OLPC) project, a nonprofit initiative announced in 2005 by the Massachusetts Institute of Technology Media
Lab. The project attempted to bring computer technology in the form of small, sturdy, and cheap laptops that
were powered by a hand crank to children in the Global South. Based on her fieldwork in Paraguay, Ames
argues that the project failed to achieve its goals for a variety of reasons, such as electricity infrastructure
problems, hardware reliability issues, software frustrations, and a lack of curricular materials. Ames argues
that “charismatic technologies are deceptive: they make both technological adoption and social change appear
straightforward instead of as a difficult process fraught with choices and politics.” When the computers did

8 U.S. Bureau of Labor Statistics, “Computer and Information Research Scientists: Job Outlook,” 2024. https://www.bls.gov/ooh/
computer-and-information-technology/computer-and-information-research-scientists.htm#tab-6
9 U.S. Citizenship and Immigration Services, “H-1B Specialty Occupations,” 2024. https://www.uscis.gov/working-in-the-united-
states/h-1b-specialty-occupations
10 American Immigration Council, “The H-1B Visa Program and Its Impact on the U.S. Economy,” 2024.
https://www.americanimmigrationcouncil.org/research/h1b-visa-program-fact-sheet

Access for free at openstax.org


1.1 • Computer Science 19

work, OLPC’s vision for education never truly materialized because children often used the computers for their
own entertainment rather than the learning experiences the designers intended. Though Ames’s account of
the OLPC project (Figure 1.6) itself has been criticized for presenting an oversimplified narrative, it still
represents a valuable argument for the risks and potential pitfalls associated with designing technologies for
global development: technology does not act on its own but is embedded in a complicated social context and
history.

Figure 1.6 In 2005, MIT’s Media Lab started the OLPC initiative to bring laptops to children in the Global South. An unexpected
outcome they discovered was that designing technologies for global communities is not as straightforward as designers may initially
believe. (credit: "One Laptop per Child" by OLE Nepal Cover/Flickr, CC BY 2.0)

THINK IT THROUGH

Internet Commerce
Many products and companies offer services or products over the Internet. While online shopping provides
additional sales opportunities for businesses, while offering consumers a convenient shopping option, it is
not without risks. For example, online businesses and their shoppers may be victims of data breaches and
identity theft. Other risks include fake reviews that motivate consumers to make a purchase, phishing that
leads to hacking, and fake online stores that take consumers’ money without delivering a product. What can
we do to mitigate the risks and dangers of online shopping?

Addressing these risks is not as simple as practicing humility and including communities in the design process.
Many challenges in computing for global development are sociopolitical or technopolitical rather than purely
technical. For example, carrying out a pilot test to evaluate the effectiveness of a design can appear as
favoritism toward the pilot group participants. These issues and social tensions are especially exacerbated in
the Global South, where the legacies of imperialism and racial hierarchies continue to produce or expand
social inequities and injustices.

The identities of people creating computer technologies for global development are ultimately just as
important as the technologies they create. In Design Justice, Sasha Costanza-Chock reiterates the call for
computer scientists to “build with, not for,” the communities they wish to improve. In this way, Design Justice
seeks to address the social justice tensions raised when asking the question, “Who does technology ultimately
benefit?” by centering the ingenuity of the marginalized “user” rather than the dominant “designer.”
20 1 • Introduction to Computer Science

In some cases, underdeveloped countries can quickly catch up without spending the money that was invested
to develop the original technologies. For example, we can set up ad hoc networks quickly today and at a
portion of the cost in Middle Eastern and African countries using technology that was developed (at a high
cost) in the United States and Europe over the past several decades. This means that sometimes, progress in
one part of the world can be shared with another part of the world, enabling that area to quickly progress and
advance technologically.

LINK TO LEARNING

The Design Justice Network (https://openstax.org/r/76DesignJust) is an organization that aims to advance


the principles of design justice and to include people who are marginalized in the technology design
process.

1.2 Computer Science across the Disciplines


Learning Objectives
By the end of this section, you will be able to:
• Differentiate between discovery and invention
• Describe how science, mathematics, and engineering each play a role in computer science
• Discuss how data science, computational science, and information science each relate to computer
science
• Explain why the various areas of computer science are synergistic

Computer science is an incredibly diverse field not because of what it can achieve on its own but because of
how it contributes to every other field of human knowledge and expertise. From its early days, it was
understood that there would be cross-collaboration between computer scientists and colleagues in other
disciplines. Today, almost all modern technologies either depend on computer technologies or benefit
significantly from them. Computer technologies and the study of computer science have reshaped almost all
facets of life today for everyone.

Data Science
Across business, financial, governmental, scientific, and nonprofit workplaces, millions of people are
programming, and most of the time, they don’t even know it! A spreadsheet is an example of a data-centric
programming environment where data is organized into cells in a table. Instead of presenting programs as
primarily about algorithms, spreadsheets present programs as primarily about data. Spreadsheets are often
used for data analysis by offering a way to organize, share, and communicate ideas about data. Spreadsheets
are uniquely effective and accessible because they allow for the visual organization of data in whichever
structure makes the most sense to the user. Instead of hiding data behind code, spreadsheets make data as
transparent and up to date as possible.

Although spreadsheets make computation accessible for millions of people across the world, they have several
shortcomings. Unless limits are removed, many popular spreadsheet software products such as Microsoft
Excel may have a limitation to the number of rows of data they can store that is less than the data of modern
computers. One such example occurred in October 2020 when Public Health England failed to report 15,841
positive cases of COVID-19 in the United Kingdom due to mismanaged row limits in the spreadsheet used. This
shortcoming attests not only to the technical limit on the number of rows supported by spreadsheets, but also
to the design limitations of software that fails to communicate data loss, irregularities, or errors to users.
Errors in spreadsheet software data entry can often go unnoticed because spreadsheets do not enforce data
types. Cells can contain any content: numbers, currencies, names, percentages, labels, and legends. The

Access for free at openstax.org


1.2 • Computer Science across the Disciplines 21

meaning of a cell is determined largely by the user rather than the software. Spreadsheets are an expressive
and accessible technology for data analysis, but this creative power that spreadsheets afford to users is the
very same power that limits spreadsheets as a data management and large-scale data analysis tool. The more
data and the more people involved in a spreadsheet, the greater the potential for spreadsheet problems.

The interdisciplinary field that applies computing to managing data and extracting information from data is
called data science. Data scientists are practitioners who combine computing and data analysis skills with the
domain knowledge specific to their field or business. The demand for data scientists is becoming increasingly
important as more and more research and business contexts involve analyzing big data, or very large datasets
that are not easily processed using spreadsheets. These datasets often involve high-volume measurements of
user interactions on the Internet at a very fine grain, such as tracking a customer’s web browser history across
an online storefront. Data scientists can then analyze browser patterns using machine learning methods in
order to recommend related products, target advertisements to specific customers over social media, and
reengage customers over email or other means. Machine learning (ML) is a subset of artificial intelligence that
relies on algorithms and data to make it possible for artificial intelligence to learn, actually mimicking the way
humans learn. For example, ML is used to identify fraudulent versus legitimate banking transactions. Once a
computer learns how to distinguish fraudulent transactions, it can be alert and call attention to suspicious
banking activity.

GLOBAL ISSUES IN TECHNOLOGY

Targeted Advertising
Although data scientists can produce immense value for business and research alike, their work also raises
significant social concerns. For example, web browser history tracking enables companies to target
advertising of products to people and also allows for targeting of political advertisements. In Antisocial
Media, Siva Vaidhyanathan argues that the “impact of Facebook on democracy is corrosive [because
political campaigns] can issue small, cheap advertisements via platforms like Facebook and Instagram that
may target “groups as small as twenty, and then disappear, so they are never examined or debated.” This
undermines the process of discussions among voters in democracies like the United States, as well as
countries like Germany, United Kingdom, and Spain, which spent the most on targeted political advertising
11
on Facebook in the spring of 2019. Given their lack of transparency, such ads are a questionable practice.
12
Another interesting topic worth mentioning here is targeted advertising toward children. It is important
to consider the ethical implications of using the data collected through tracking, especially when it comes to
targeting at-risk populations. It raises questions about the accountability of platforms and advertisers in
safeguarding users’ rights and ensuring transparency in how data is used for these purposes.

Computational Science
Beyond data science, computer science can also fundamentally change how science is researched and
developed. The field of computational science refers to the application of computing concepts and
technologies to advance scientific research and practical applications of scientific knowledge in a wide range of
fields, including civil engineering, finance, and medicine (among many others). For example, algorithms and
computer software play a key role in enabling numerical weather prediction (Figure 1.7) or the use of
mathematical models to forecast weather based on current conditions in order to assist peoples’ everyday
lives and contribute to our understanding of the climate models, climate changes, and climate catastrophes.
These algorithms may rely on a large amount of computer hardware power that might not be available in a

11 Statista, “European Elections: Countries that spent the most on targeted political advertising on Facebook from March 1 to May
26, 2019*,” 2019. https://www.statista.com/statistics/1037329/targeted-political-ad-spend-on-facebook-by-eu-countries/
12 Maya Brownstein. Harvard study is first to estimate annual ad revenue attributable to young users of these platforms. January 2,
2024. https://news.harvard.edu/gazette/story/2024/01/social-media-platforms-make-11b-in-ad-revenue-from-u-s-teens/
22 1 • Introduction to Computer Science

single system, so the work may need to be distributed across many computers. Computational science studies
methods for realizing these algorithms and computer software.

Figure 1.7 Meteorologists collect data from a variety of sources and use the data, algorithms, and computers to predict the weather.
(data source: Climate Forecast System, National Centers for Environmental Information, National Oceanic and Atmospheric
Administration, https://www.ncei.noaa.gov/products/weather-climate-models/climate-forecast-system; credit: modification of "CFSR
Atmospheric Precipitable Water" by NOAA/ncei.noaa.gov, Public Domain)

INDUSTRY SPOTLIGHT

Computer Science and Climate Change


Computer science fights climate change and limits the impacts of climate catastrophes by enabling
technologies for decarbonization through power consumption optimization and advancing renewable
energy sources. Numerical weather forecasting not only supports our everyday lives but also helps climate
scientists determine the precise locations for wind turbines and simulate how they should be designed to
enable the greatest energy production. To support the power grid, data science methods can help predict
peak power consumption, optimize power sources to produce exactly the right amount of power needed,
and adjust power storage to reduce the amount of energy that needs to be generated from nonrenewable
sources. Computer models and algorithms assist energy engineers in optimizing building air conditioning
and power demands so that they efficiently serve the people living, working, and playing within them.

Although computer science has been used to support scientific discovery, the theory of knowledge of
computer science has historically been considered quite different from that of the natural sciences, such as
biology, physics, and chemistry. Computer science does not study natural objects, so to most, it would not be
considered a natural science but rather an applied science. Unlike natural sciences such as biology, physics,
and chemistry, which emphasize the discovery of natural phenomena, computer science often emphasizes
invention or engineering.

However, computer science is today deeply interdisciplinary and involves methods from across science,
mathematics, and engineering. Computer scientists design, analyze, and evaluate computational structures,
systems, and processes.

• Mathematics plays a key role in theoretical computer science, which emphasizes how a computational
problem can be defined in mathematical terms and whether that mathematical problem can be efficiently

Access for free at openstax.org


1.2 • Computer Science across the Disciplines 23

solved with a computer.


• Engineering plays a key role in software engineering, which emphasizes how problems can be solved
with computers as well as the practices and processes that can help people design more effective software
solutions.
• Science plays a key role in human-computer interaction, which emphasizes experimentation and
evaluation of the interface (boundary) between humans and computers, often toward designing better
computer systems.

Information Science
Not only is computation interdisciplinary, but other disciplines are also becoming more and more
computational. In The Invisible Future, Nobel Laureate biologist David Baltimore defines DNA in computational
terms. He states that biology is an information science because DNA encodes for the outputs of biological
systems. The interdisciplinary field studying information technologies and systems as they relate to people,
organizations, and societies is called information science. The role of information in natural sciences can also
be found in the physics of quantum waves that carry information about physical effects, in the chemical
equations that specify information about chemical reactions, in the information flows that drive the evolution
of economies and political organizations, and in the information processes underlying social, management,
13
and communication sciences.

CONCEPTS IN PRACTICE

Computer Science and DNA


Research into DNA sequencing and indexing is opening new ways of helping medical providers offer
personalized treatments for patients. Large-scale genome sequencing of not only the human genome, but
also the DNA signatures for viruses has made it possible for medical providers to take human fluid samples
and analyze them for the presence of infectious diseases. This research requires computer science
concepts, including specialized medical computer devices to sequence the billions of nucleotides that form
a DNA sequence, data structures and algorithms to efficiently process and identify DNA signatures, and the
miniaturization of computer hardware so that this technology is accessible (both in terms of price and
physical size) in more and more care centers.

Although information science has its roots in information classification, categorization, and management in
the context of library systems, information science today is a broad field that encompasses the many diverse
ways information shapes society. For example, today’s social media networks provide more personable and
instantaneous information communication compared to traditional news outlets—billions of people around
the world are using social media to engage with information about the world. For many people, social media
may be the primary way that they learn about and make sense of the world (Figure 1.8). Yet, we’ve already
seen risks associated with information technologies such as the Internet. In today’s “information age,”
information has more power than ever before to reshape society. Information scientists, data scientists,
computational scientists—and, therefore, computer scientists—have a social responsibility: “One cannot reap
14
the reward when things go right but downplay the responsibility when things go wrong.”

13 P. J. Denning, “Computing is a natural science,” Commun. ACM, vol. 50, no. 7, pp. 13–18, July 2007. https://doi.org/10.1145/
1272516.1272529.
14 R. Benjamin, “Race after technology: Abolitionist tools for the new Jim code,” 2019, Polity.
24 1 • Introduction to Computer Science

Figure 1.8 While Americans used to primarily get their news from newspapers, as technology has advanced their primary source of
news media has shifted. As of 2020, 53% of American adults surveyed stated they got their news from social media at least some of
the time. (data source: Elisa Shearer and Amy Mitchell, Pew Research Center. "About Half of Americans Get News on Social Media at
Least Sometimes." From Survey of U.S. adults conducted Aug. 31–Sept. 7, 2020. In: E. Shearer, A. Mitchell, "News Use Across Social
Media Platforms in 2020," Jan 12, 2021.; attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Despite the centrality of information to decision-making and social change, dominant approaches to computer
science tend to focus on computational structures, systems, and processes (such as algorithms) that describe
one kind of information by focusing on the what or how of solving problems with computers, but less often
the why or who questions. Information science broadly centers people, organizations, and society in the study
of information technologies.

Computer Science Is an Interdisciplinary Field


In presenting data science, computational science, and information science, we’ve introduced the idea that
computer science can shape other disciplines. But we’ve also raised questions about what computer science is
today. If computer science is the study of all “phenomena surrounding computers,” it could also involve data
science, computational science, bioinformatics, cheminformatics, computational social science, medical
informatics, and information science. As you will learn in Chapter 13 Hybrid Multicloud Digital Solutions
Development, another aspect of computer science is responsible computing, which includes the appropriate
management of cyber resources as well as robust cybersecurity. It is difficult to define computer science today
because it is so widely used by people across the world in diverse capacities. Definitions are about defining
boundaries and excluding practices, which may be helpful for understanding the practices of a certain culture
or group that is “doing” computer science, but it can never truly represent everyone and all the things that
people are doing with computer science. However, computer science’s historical roots in mathematics shape
the way it categorizes subfields:

• Theoretical computer science


◦ Theory of computation
◦ Information representation
◦ Data structures and algorithms
◦ Programming language and formal methods

• Computer systems
◦ Architecture
◦ Artificial intelligence
◦ Networks

Access for free at openstax.org


1.3 • Computer Science and the Future of Society 25

◦ Security
◦ Databases
◦ Distributed computing
◦ Graphics

• Applied computer science


◦ Scientific computing
◦ Human-computer interaction
◦ Software engineering

In this hierarchy, theoretical computer science and computer systems are treated separately from applied
computer science and human-computer interaction, suggesting that the mathematics of computing are pure
and separate from social questions. Yet we’ve seen several examples that question this paradigm and instead
point to a structure where human-computer interaction is infused throughout the study of computer science
and all its subfields.

Today, computer science is a field that is just as much about people as it is about computer technology
because each of these subfields is motivated by the real-world problems that people ultimately want to solve.
The subfields of artificial intelligence and machine learning have applications that directly influence human
decision-making, ranging from advertisement targeting to language translation to self-driving cars. Effective
computational solutions to research or business problems require combining specific knowledge with
computer science concepts from a combination of areas. For example, the computational science application
of weather prediction combines knowledge about various subfields of computer science (algorithms,
distributed computing, computer systems) with knowledge about climate systems. Theoretical computer
scientists are increasingly interested in asking questions such as, “How do we design, analyze, and evaluate
algorithms or information systems for fairness? How do we even define fairness in a computer system that
strips away the complexities of the real world? What ideas or information are encoded in the data? And what
are the limits of our approaches?” Computer science is a complex field, and its synergistic nature means that
when computer science is used in an interdisciplinary manner that shapes other disciplines, its impact on
society is much greater than when each discipline functions on its own.

1.3 Computer Science and the Future of Society


Learning Objectives
By the end of this section, you will be able to:
• Discuss how computer scientists develop foundational technologies
• Discuss how computer scientists evaluate the negative consequences of technologies
• Discuss how computer scientists design technologies for social good
26 1 • Introduction to Computer Science

As noted earlier, computer science is a powerful tool, and computer scientists have vast technological
knowledge that continues transforming society. Computer scientists have an obligation to be ethical and good
stewards of technology with an emphasis on responsible computing. Written code influences daily life, from
what we see on social media to the news stories that pop up in a Google search and even who may or may not
receive a job interview. When computer scientists don’t consider the ramifications of their code, there can be
unintended consequences for people around the world. The Y2K problem, also known as the “millennium
bug,” is a good example of shortsighted decisions that allowed computer scientists to only store the last two
digits of the year instead of four. This made sense at a time when memory was expensive on both mainframe
computers and early versions of personal computers. The Y2K problem was subsequently coined by John
15
Hamre, the United States Deputy Secretary of Defense, as the “electronic equivalent of the El Niño.” The
future of computer science will highly affect the future of the world. Although we often think of computer
technologies as changing the way the world works, it is actually people and their vision for the future that are
amplified by computing. The relationship between computer science and people is about how computer
technologies can bias society and how the choices made through computer systems can both promote and
discourage social inequities. Computer technologies can encode either value or both values in their designs. In
this section, we’ll introduce three ways that computer science can shape the future of society: developing
foundational technologies, evaluating negative consequences of technologies, and designing technologies for
social good.

Developing Foundational Technologies


We’ve seen how foundational technologies like artificial intelligence, algorithms, and mathematical models
enable important applications in data science, computational science, and information science.

As noted previously, artificial intelligence (AI) is the development of computer functions to perform tasks, such
as visual perception and decision-making processes, that usually are performed by human intelligence. AI
refers to a subfield of CS that is interested in solving problems that require the application of machine learning
to human cognitive processes to achieve goals. AI research seeks to develop algorithm architectures that can
make progress toward solving problems. One such example is image recognition, or the problem of
identifying objects in an image. This problem is quite difficult for programmers to solve using traditional
programming methods. Imagine having to define very precise rules or instructions that could identify an
object in an image regardless of its position, size, lighting conditions, or perspective. As humans, we have an
intuitive sense of the qualities of an object. However, representing this human intelligence in a machine
requiring strict rules or instructions is a much harder task. AI methods for image recognition involve designing
algorithm architectures that can generalize across all the possible ways that an object can appear in an image.

INDUSTRY SPOTLIGHT

Agricultural Robots
Agricultural robots help large-scale industrial farmers produce crops more efficiently and support
sustainability efforts. One agricultural robot is now being used to improve fertilizer and pesticide
treatments by taking pictures of plants as a farmer drives a tractor over the field. Artificial intelligence
techniques are used to recognize and identify the lettuce plants and weed plants in the image. For each
identified lettuce or weed plant, the robot makes a personalized decision about the best chemical
treatment for the plant in real time as the tractor moves to the next row of crops. This ability to personalize
chemical treatments improves yields and plant quality for large-scale industrial agriculture by producing
more crops with fewer chemicals.

15 “Looking at the Y2K bug,” portal on CNN.com. Archived 7 February 2006 at the Wayback Machine. https://web.archive.org/web/
20060207191845/http://www.cnn.com/TECH/specials/y2k/

Access for free at openstax.org


1.3 • Computer Science and the Future of Society 27

Recent approaches to AI for image recognition draw on a family of methods called neural networks instead of
having programmers craft rules or instructions by hand to form an algorithm. In humans, the neural network
is a complex network in the human brain that consists of neurons, or nerve cells, connected by synapses that
send messages and electrical signals to all parts of the body, enabling us to think, move, feel, and function.

In computer science, a neural network (Figure 1.9) is an AI algorithm architecture that emphasizes
connections between artificial nerve cells whose behavior and values change in response to stimulus or input.
These neural networks are not defined by individual neurons but by the combination of all the neurons in the
network. Typically, artificial neurons are arranged in a hierarchy that aims to capture the structure of an image.
Although the first level of neurons might respond to individual pixels, later levels of artificial neurons might
respond in aggregate to the arrangement of several artificial neurons in the preceding layer. This is similar to
how the human visual system responds to edges at the lower levels, then responds in aggregate to the specific
arrangement of several edges in later levels, and ultimately identifies these aggregated arrangements as
objects.

Figure 1.9 (a) An artificial neural network consists of three key layers: the input layer, where raw data enters the system; the hidden
layer, where information is processed and patterns are identified; and the output layer, where results are presented. (b) A natural
neural network, such as those in the human body, mirrors this structure. The input layer represents sensory receptors, like those in
the retina. The hidden layer corresponds to the synapse, where partial processing of the sensory data occurs. Finally, the output
layer represents the information sent to the brain for final processing and interpretation. (credit a: modification of "Neural network
example" by Wiso/Wikipedia, Public Domain; credit b: modification of work from Psychology 2e. attribution: Copyright Rice
University, OpenStax, under CC BY 4.0 license)

The idea of neural networks, however, is not as new as it might seem. Artificial neural networks were first
imagined in the mid-1900s alongside contemporary research efforts in the cognitive sciences. The ideas of
multilayered, hierarchical networks of neurons and the mathematical optimization methods for learning were
all there, but these early efforts were limited by the computational processing power available at the time. In
addition, the large datasets that drive neural network learning were not nearly as available as they are today
with the Internet. Developments in foundational technologies such as computer architecture and computer
networks paved the way for the more recent developments in neural network technologies. The broad area of
computer systems investigates these architectures and networks that enable new algorithms and software.
Without these technologies, neural networks would not be nearly as popular and revolutionary as they are
today. Yet the relationship between computer systems and AI development is not one-directional. Today,
computer scientists are using neural networks to help design new, more efficient computer systems. The
28 1 • Introduction to Computer Science

development of foundational computer technologies not only creates opportunities for direct and indirect
applications, but also supports the development of other computer technologies.

Just as we saw how technological fixes embodied a powerful belief about the relationship between computer
solutions and social good, a similar cultural belief exists about the relationship between foundational
technologies and their social values. The belief that technologies are inherently neutral and that it is the
people using technology who ultimately make it “good” or “bad” is considered social determination of
technology.

THINK IT THROUGH

Social Determination of Technology


Do you agree with the social determination of technology? Is it possible for technology—before it is used by
people to solve certain problems—to encode social values? Try to come up with an example that would
support this belief. What about an example that refutes this belief? What are the social implications of
agreeing or disagreeing with the social determination of technology?

Today’s neural networks are designed to identify patterns and reproduce existing data. It is widely accepted
that many big datasets can encode social preferences and values, particularly when the data is collected from
users on the Internet. A social determination of technology accepts this explanation of AI bias and leaves the
design of AI algorithms and techniques as neutral: the bias in an AI system is attributed to the social values of
the data rather than the design of the AI algorithms. Critics of social determination point out that the way AI
algorithms learn from big data represents a social value, one that encodes a default preference for
reproducing the biases inherent in big data. This applies whether the AI application is about fair housing,
medical imaging, ad targeting, drone strikes, or another topic. This is an issue that computer scientists must
consider as they practice responsible computing and strive to ensure that data is gathered and handled as
ethically as possible.

Evaluating Negative Consequences of Technology


Today’s AI technologies work by reproducing existing patterns rather than imagining radically different
futures. As much as neural networks are inspired by the human brain, it would be a stretch to suggest that AI
systems have any semblance of general intelligence. Though these systems might be quite effective at
identifying lettuce plants from weed plants in an image, their capacity for humanlike intelligence is limited by
design. A neural network learns to recognize similar patterns that appear across millions or billions of sample
images and represent these patterns with millions or billions of numbers. Mathematical optimization methods
are used to choose the numeric values that best encode correlations across the sample images. However,
current approaches lack a deeper, conceptual representation of objects. One criticism of very large neural
networks is that there are often more numeric values than there are sample images—the network can
effectively memorize the details of a million sample images by encoding them in a billion numbers. Many of
today’s neural networks recognize objects in images not by relying on some intrinsic idea or concept of objects
but by memorizing every single configuration of edges as they appear in the sample images.

This limitation can lead to peculiar outcomes for image recognition systems. Often, neural network
approaches for image recognition have certain examples of images where objects are misidentified in unusual
ways: a person’s face might be recognized in a piece of toast or in a bunch of clouds in the sky. In these
examples, the pattern of edges might coincidentally trigger the neural network values so that it misidentifies
objects. These are among the more human-understandable examples; there are many other odd situations
that are less explainable. An adversarial attack is a sample input (e.g., an image) that is designed to cause a
system to behave problematically. Researchers have found that even tweaking the color of just a single point
in an image can cause a chain reaction in the neural network, leading it to severely misidentify objects. The

Access for free at openstax.org


1.3 • Computer Science and the Future of Society 29

adversary can choose the color of the point in such a way as to almost entirely control the output of some
neural networks: changing a single specific point in an image of a dog might cause the system to recognize
the object as a car, airplane, human, or almost anything that the adversary so desires. Moreover, these
adversarial attacks can often be engineered to cause the neural network to report extremely high confidence
in its wrong answers. Self-driving cars that use neural networks for image recognition can be at risk of real-
world adversarial attacks when specially designed stickers are placed on signs that cause the system to
recognize a red light as a green light (Figure 1.10). By studying adversarial attacks, researchers can design
neural networks that are more robust and resilient to these attacks.

Figure 1.10 (a) Autopilot functions in self-driving cars generally identify roads and lanes using artificial intelligence to “see” road
markings. (b) Researchers were able to trick these cars into seeing new lanes by using as few as three small stickers, to confuse the
neural networks and force the cars to change lanes. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

In general, research is an important part of computer science. Through research, computer scientists analyze
ways that technology can be used and gain insight and answers to address issues and improve various aspects
of society. Research enables computer scientists to make advancements like the design of new algorithms,
development of new hardware and software, and applications for emerging technologies such as AI.

One important use of research is to investigate adversarial attacks to gather answers needed for computer
scientists to improve foundational technologies by evaluating the negative consequences of technologies.
Computer technologies offer a unique medium for learning things (not just learning computer science),
connecting with each other, and enhancing the lives of people all around the world. Yet, in each of these
examples, we also raised concerns about how these technologies unfolded and affected people’s lives in both
positive and negative ways. While we can rarely, if ever, paint any one technology as purely “good” or “bad,”
30 1 • Introduction to Computer Science

computer scientists are interested in studying questions around how technologies are designed to center
social values. Social scientists are not solely responsible for answering questions about technology, but
computer scientists can also contribute important knowledge and methods toward understanding computer
technologies.

Designing Technologies for Social Good


Computer science can advance social good by benefiting many people in many different areas, including
public health, agricultural sustainability, climate sustainability, and education.

Computer technologies accelerate medical treatments for public and personal health from initial research and
development to clinical trials to large-scale production and distribution. In January 2020, Chinese officials
posted the genetic sequence of the coronavirus SARS-CoV-2. This helped pharmaceutical companies to begin
developing potential vaccines for the virus at a significantly faster rate than for any other virus in the past
(Figure 1.11).

Figure 1.11 The SARS-CoV-2 outbreak that began in 2020 displayed how quickly computer science could be harnessed by
governments, medical facilities, and scientists to decode the virus, develop treatments, and distribute vaccinations around the world.
What would have been a very difficult feat to manage manually was simplified through the use of algorithms and computer
technology. (credit left: modification of "COVID-19 vaccines" by Agência Brasília/Flickr, CC BY 2.0; credit center: modification of "T04"
by Sarah Taylor/Flickr, CC BY 2.0; credit right: modification of “Back2School Brigade prepares families for upcoming school year” by
Thomas Karol/DVIDS, Public Domain)

Computational science enables the miracles of modern medicine. Viral sequences can be digitized and rapidly
shared between researchers across the world via the Internet. Computer algorithms and models can simulate
the human immune system responses to particular treatments within hours rather than years. The first
treatments can then be produced at a small scale using computer-engineered cells in less than a month from
the initial sequencing. To ensure the treatments are safe and effective, clinical trials are held at disease
transmission “hot spots” predicted using data science methods drawing on data aggregated and monitored
from across the world. Once a treatment is proven safe and effective, it is mass-produced with the help of
computer-controlled robots and automated assembly lines. Algorithms manage the inventory supply and
demand and control the transportation of treatments on trucks and planes guided by computer navigation
systems. Web apps and services notify people throughout the process.

Yet the use of computer technology throughout modern medicine is anything but politically neutral.
Computers, algorithms, and mathematical models solve the problems that their creators wish to solve and
encode the assumptions of their target populations. Supply and demand data for the data models are

Access for free at openstax.org


1.3 • Computer Science and the Future of Society 31

determined by various factors, at least partly in response to the money and relationships between countries
that control the technology, the Global North, and countries that don’t, the Global South. Within local
communities, the uptake of medical treatments is often inequitable, reflecting and reinforcing historical
inequities and disparities in public health. Computer technology alone often doesn’t address these issues. In
fact, without people thinking about these issues, computer technologies can often amplify disparities.
Consider datasets, which can be biased if they overrepresent or underrepresent specific groups of people. If
decisions are made on the basis of biased data, people in the groups that are not represented fairly may
receive inequitable treatment. For example, if a local government agency is working with a biased dataset,
political leaders may make decisions that result in certain citizens receiving inadequate funding or services.
This is an example of why responsible computing, which we will cover in Chapter 14 Cyber Resources Qualities
and Cyber Computing Governance, is so important.

These problematic histories are not only aggravated in medicine and public health, but also reflected in
housing. Redlining refers to the inequitable access to basic public services based on residents’ neighborhoods
and communities, which includes the practice of withholding financial services from areas with a large
underrepresented population. In the United States, these communities reflect the histories of racial
segregation and racial wealth inequalities. Fair housing laws are intended to prevent property owners from
discriminating against buyers or renters because of race, color, ability, national origin, and other protected
classes. But computer technologies also present new kinds of challenges. Microtargeted ads on social media
platforms contribute to not only political polarization, but also discrimination in housing. This can be a
particular problem when combined with redlining. Even if the ad targeting is not explicitly designed to
discriminate, microtargeted ads can still reinforce historical redlining by incorporating data such as zip codes
or neighborhoods. This may result in digital redlining, which is the practice of using technology, such as
targeted ads, to promote discrimination. In 2021, a Facebook user filed a class-action lawsuit that argued nine
companies in the Washington, D.C., area deliberately excluded people over the age of 50 from seeing their
16
advertisements for housing because they wanted to attract younger people to live in their apartments. This
is an example of an issue in technology that should be addressed by responsible computing with an emphasis
on ethical behavior.

With good intentions and attention to personal biases, technologies can be designed for social good. For
example, a hypothetical algorithm for fair housing could evenly distribute new housing to people across
protected classes and marginalized identities, such as older populations. Of course, algorithmic control and
automated decision-making is challenged to consider the underlying conditions behind social problems. Still,
algorithms can be important tools to enable us to distribute outcomes more fairly from a statistical
perspective, and this can be an important step in addressing the larger societal systems and inequities that
produce social problems.

LINK TO LEARNING

Review the Parable of the Polygons (https://openstax.org/r/76polygons) by Vi Hart and Nicky Case. In it, the
authors show how a segregated world where people simply prefer living near other people who are like
themselves (a “small individual bias”) can re-create and reproduce “large collective bias.”

As part of responsible computing, computer scientists must be aware of technological fix, which refers to the
idea that technologies can solve social problems, but is now often used to critique blind faith in technological
solutions to human problems. Unless the process is handled responsibly, the “fix” may cause more problems
than it resolves. When considering how to address social and political problems, computer scientists must take
care to ensure that they select the appropriate technology to address specific problems.

16 C. Silva, “Facebook ads have a problem. It’s called digital redlining,” 2022. https://mashable.com/article/facebook-digital-
redlining-ads-protected-traits-section-230
32 1 • Introduction to Computer Science

To address social problems and advance social good, recall that human-centered computing emphasizes
people rather than technologies in the design of computer solutions. A human-centered approach to fair
housing might begin by centering local communities directly affected by redlining. Rather than replacing or
disrupting the people and organizations already working on a problem, a human-centered approach would
center them in the design process as experts. A human-centered approach requires that the designer ask why
they are not already working with people in the community impacted by their work.

LINK TO LEARNING

Anatomy of an AI System (https://openstax.org/r/76AIanatomy) illustrates how an AI system like the


Amazon Echo does not just involve computer technology, but also involves a vast and deeply
interconnected web of human labor, data, and physical resources that are often taken for granted.
Evaluation of the negative consequences of technology does not end at the technology itself, but also
considers its broad-reaching impacts and implications for people around the world.

Access for free at openstax.org


1 • Chapter Review 33

Chapter Review
Key Terms
adversarial attack sample input (e.g., an image) that is designed to cause a system to behave
problematically
algorithm sequence of precise instructions
artificial intelligence (AI) development of computer functions to perform tasks, such as visual perception
and decision-making processes, that usually are performed by human intelligence
big data very large datasets that aren’t easily processed using spreadsheets
computational science application of computing concepts and technologies to advance scientific research
and practical applications of science knowledge
computer program algorithms that can be run on a computer
computer science (CS) study of the phenomena surrounding computers
computing all phenomena related to computers
data science interdisciplinary field that applies computing toward managing data and extracting information
from data
hardware physical, real-world materials that enable computation
human-computer interaction (HCI) subfield of computer science that emphasizes the social aspects of
computation
image recognition problem of identifying objects in an image
information science interdisciplinary field studying information technologies and systems as they relate to
people, organizations, and societies
memory means of addressing information in a computer by storing it in consistent locations
network various technological devices that are connected and share information
neural network AI algorithm architecture that emphasizes connections between artificial “neurons” whose
behavior and values change in response to stimulus or input
processor computer’s “brain,” that follows instructions from algorithms and processes data
programming language language consisting of symbols and instructions that can be interpreted by a
computer
social determination of technology belief that technologies are inherently neutral, and that it is the people
who use a technology who ultimately make it “good” or “bad”
software algorithmic principles that determine how results are computed
software engineering subfield of computer science that emphasizes how problems can be solved with
computers as well as the practices and processes that can help people design more effective software
solutions
spreadsheet data-centric programming environment where data is organized into cells in a table
storage hardware and physical components of a computer that permanently house a computer’s data
technological fix idea that technologies can solve social problems, but now often used to critique blind faith
in technological solutions to human problems
theoretical computer science mathematical processes behind software
Turing-complete fundamental model for computing results and every computer has the ability to run any
algorithm
vacuum tube physical device that works like a light bulb used as memory in early digital computers

Summary
1.1 Computer Science
• Computer science is pervasive in our daily lives, business and industry, scientific research and
development, and social change.
• Computer science (CS) is the study of computing, which includes all phenomena related to computers,
34 1 • Chapter Review

such as the Internet. With foundations in engineering and mathematics, computer science focuses on
studying algorithms, which are instructions that enable computing. This includes computer hardware and
software and the way these are used to process information. Three perspectives on computers include the
hardware perspective, software perspective, and theoretical perspective. These perspectives each
emphasize different aspects of computation, and they’re often centered in undergraduate computer
science because of the history of computer science, but there are other perspectives on computer science.
• People have used computer science to advance many more diverse goals beyond making war or making
money. Computing was imagined as: a new medium for helping people learn everything; a new
technology that could enable anti-racism; a means of enabling global development for peoples across the
world. Yet these visions and promises are still taking hold in a world largely focused on the dominant
history of computer science.

1.2 Computer Science across the Disciplines


• By contributing tools and resources to handle tasks and improve operations, computer science enables
many other fields and areas of research or development.
• Data science is an interdisciplinary field that applies computing to managing data and extracting
information from data. Many millions of people engage in data science work by using spreadsheets. Still,
data science also often emphasizes larger-scale problems involving big data that are hard to manage
using spreadsheets alone.
• Computational science refers to applying computing concepts and technologies to advance scientific
research and practical applications of science knowledge. Computer science’s emphasis on creating things
can help other sciences by, for example, contributing new models or simulations that can enable the
discovery of new kinds of scientific knowledge previously inaccessible to scientists.
• Information science is an interdisciplinary field studying information technologies and systems as they
relate to people, organizations, and societies. As computing is now so central to information management
and information exchange, information science has significant overlap with computer science. Still, it
tends to emphasize the social value of information, whereas computer science has (historically)
emphasized algorithms and computation over people or information.
• Today, computer science is an interdisciplinary field that contributes to all other fields. Effective
computational solutions to research or business problems require combining domain-specific knowledge
with computer science concepts from a combination of areas.

1.3 Computer Science and the Future of Society


• Computer science is shaping the future of society. There are three ways in which computer science can
shape the future of society: developing foundational technologies, evaluating negative consequences of
technologies, and designing technologies for social good.
• As one example of developing foundational technologies, computer science’s rapid development of
artificial intelligence technologies (and the current trend around neural networks) has enabled many new
applications like image recognition. These developments often do not occur in isolation: the popular use
of neural networks, for example, depended on new computer architectures and advancements in the
Internet (computer networks). Technologies can encode social values: neural networks are designed to
learn from big data, so they encode a preference for the contemporary social realities that produced the
data.
• As one example of evaluating negative consequences, computer science considers the philosophical and
practical limitations of neural networks. Research into adversarial attacks can enable computer scientists
to develop more robust neural networks that are safer and more effective.
• As one example of designing technologies for social good, computer science contributes to the research,
development, mass production, delivery, and logistics of modern medicine from beginning to end. Yet
applications for social good are often embedded in broader social and political dynamics that computer
science has difficulty addressing. Even though computer technologies can be designed for social good,
they can cause harm when their design processes fail to center on human values and diverse users.

Access for free at openstax.org


1 • Chapter Review 35

Review Questions
1. What is computer science?

2. What two subjects does computer science combine the foundations of?
a. math and engineering
b. math and physics
c. physics and engineering
d. math and chemistry

3. What can execute an algorithm?

4. What enables the ENIAC (one of the first digital computers, invented in 1945) to be able to compute
anything that can run on modern computers?
a. Both the ENIAC and modern computers have memory.
b. Both the ENIAC and modern computers share the same hardware.
c. Both the ENIAC and modern computers are considered Turing-complete.
d. Both the ENIAC and modern computers run the same software.

5. What invention was credited as the first calculator?


a. punch cards
b. abacus
c. Difference Machine
d. ENIAC

6. What term is considered an algorithm that can be run on a computer?


a. artificial intelligence
b. algorithm
c. computer program
d. programming language

7. Why is computer science often not considered a science?


a. Computer science does not study natural objects.
b. Computer science emphasizes the discovery of natural phenomena.
c. Computer science is spreadsheet-based.
d. Computer science focuses on computational structures.

8. What is the definition of data science?


a. a subfield of computer science that emphasizes the social aspects of computation
b. an interdisciplinary field studying information technologies and systems as they relate to people,
organizations, and societies
c. a subfield of computer science that emphasizes how problems can be solved with computers as
well as the practices and processes that can help people design more effective software solutions
d. an interdisciplinary field that applies computing toward managing data and extracting information
from data

9. What term is used to describe a subfield of computer science that emphasizes how a computational
problem can be defined in mathematical terms and whether that mathematical problem can be efficiently
solved with a computer?
a. computational science
b. theoretical computer science
36 1 • Chapter Review

c. information science
d. data science

10. What subfield of computer science relates information technology to people and society?
a. computational science
b. theoretical computer science
c. information science
d. data science

11. How does computational science contribute new methods to the study of the sciences?

12. How do information science and computer science compare?

13. What does it mean to say that the various areas of computer science are synergistic?

14. What term is defined as an approach that emphasizes people rather than technologies in the design of
computer solutions?
a. human-centered computing
b. neural network
c. social determination of technology
d. technological fix

15. A software application takes an image as an input and analyzes it. This is an example of what?
a. illustrative processing
b. image recognition
c. image generation
d. analytical modeling

16. What are adversarial attacks and why is it important to study them?

17. What is the relationship between artificial intelligence, image recognition, and neural networks?

18. How do neural networks recognize objects in images?

Conceptual Questions
1. Give two definitions of computer science. How do they compare?

2. Explain the concept of theoretical computer science.

3. What is body-syntonic reasoning and how has it affected education?

4. In terms of data science, what is a spreadsheet and why can it be said that by using spreadsheets, people
are programming without realizing it?

5. How do discovery and invention differ and how are these involved in computer science?

6. Describe in your own words the difference between data (as in data science) and information (as in
information science). How does computer science shape both fields?

7. Give a real-life example that refutes social determination of technology. Your example does not need to
involve computing, but it should involve some technology designed by humans.

8. Artificial intelligence approaches are typically used to solve problems that requires specific kinds of
“intelligence.” Describe a real-life computational problem or application that does not need artificial
intelligence.

9. Image recognition systems that can detect objects in images enable self-driving cars and many large-scale

Access for free at openstax.org


1 • Chapter Review 37

manufacturing or agricultural operations. Give another example where image recognition could be used
as part of a larger system to automate decisions at scale.

Practice Exercises
1. Research where the term debugging originated from and why it refers to finding problems in our
programs today.

2. Research some examples of computational models and how they are a part of everyday life.

3. Research how computational models relate to mathematics.

4. Research some examples of how artificial intelligence is used across multiple industries. Summarize at
least two different industries and how AI is currently being used.

5. Research ethical issues related to artificial intelligence and provide an example of how artificial intelligence
can be misused for unethical and even criminal ways.

Problem Set A
1. Research examples of how modeling and simulations have led to new inventions and discoveries.

2. Research and provide a summary of the difference between artificial intelligence and machine learning.

Problem Set B
1. Research how artificial intelligence and machine learning can improve the accuracy of computational
models and lead to cutting-edge technology inventions in the future. Provide a specific example of how AI
and ML have been used in this way so far.

2. Research and provide a summary of how machine learning is a subset of artificial intelligence and plays a
key role in artificial intelligence systems.

Thought Provokers
1. Corporate social responsibility is the idea that businesses have a responsibility to society, including the
areas of environmental responsibility, ethical responsibility, philanthropic responsibility, and economic
responsibility. Given the contentious history of computer science and computer technologies, what can
businesses (or businesspeople) that wish to employ a “disruptive” computer technology do to ensure
corporate social responsibility?

2. Computer technologies like the Internet have changed everyone’s lives, regardless of whether they use the
Internet directly or not. Yet, with computer technologies, the future is rarely certain. How can a business
stay relevant and profitable in the face of new technologies while ensuring corporate social responsibility?
In what ways does ensuring corporate social responsibility create economic value and more diverse kinds
of value?

3. Give a real-life public policy problem involving a computer technology or dataset and use it to illustrate
differences between the fields of data science, information science, and computer science.

4. Mathematics is one of three perspectives that computer scientists use to design, analyze, and evaluate
computational structures, systems, and processes. However, mathematics is often regarded as an abstract
or “pure” field of study that is not involved in social or political concerns. How might computer science’s
ability to automate and represent problems in mathematical terms have social or political consequences?

5. The future of society will be shaped by the philosophy of computer science and people’s purposes,
motivations, and political values. Give another philosophy that might influence or affect how computer
scientists go about creating solutions.
38 1 • Chapter Review

6. If your organization is interested in artificial intelligence technology to enhance operations, what could
you do to ensure the system is designed and implemented in a safe, socially responsible, and just manner?

Labs
1. Explore the Parable of the Polygons (https://openstax.org/r/76polygons). How does computer science
contribute to the simulation? What does the simulation suggest is needed in the world? What are the
limitations of the simulation as a model of the much more complicated real world?

2. Explore the Anatomy of an AI System (https://openstax.org/r/76AIanatomy) that breaks down the “human,
labor, data and planetary resources” behind an Amazon Echo device. What parts surprised you? Based on
your understanding of computer science, what parts are emphasized in the public conscience? What parts
are downplayed or minimized? Then, select one surprising aspect and investigate it further.

3. Explore DALL-E (https://openstax.org/r/76DALL-E), a very large neural network created by OpenAI “that
creates images from text captions for a wide range of concepts expressible in natural language.” If the
neural network is learning from English language images and captions on the Internet, what are some of
the social risks of this system? How might it encode problematic ideas about marginalized people in
society?

Access for free at openstax.org


2
Computational Thinking and Design Reusability

Figure 2.1 A work environment, such as this modifiable “sp.ace” at Johnson Space Center where furniture can move and any surfaces
can be written on, can encourage computational thinking and lead to innovation. (credit: modification of “The sp.ace in Building 29 at
Johnson Space Center” by Christopher Gerty, NASA JSC/APPEL Knowledge Services, Public Domain)

Chapter Outline
2.1 Computational Thinking
2.2 Architecting Solutions with Adaptive Design Reuse in Mind
2.3 Evolving Architectures into Useable Products

Introduction
In the rapidly evolving landscape of technology and innovation in various domains, computational thinking
promotes design reusability and is a fundamental skill set essential for problem-solving. This chapter
illustrates how computational thinking, through practical insights and theoretical frameworks, facilitates the
creation of reusable designs that improve the qualities (e.g., scalability, efficiency) of solutions.

Developing innovative solutions to business problems today involves reinventing, rethinking, and rewiring
existing business solutions and leveraging the latest technologies to assemble competitive products that solve
business problems. It is key to apply computational thinking throughout this process to tackle new problems
in specific areas. Computational thinking is a problem-solving and cognitive process rooted in principles
derived from computer science. It involves breaking down complex problems into smaller, more manageable
parts and devising systematic approaches to solve them.

Adaptive design reuse also makes it possible to quickly assemble business solutions by assembling existing
design building blocks that require minimal customizations to be a good match for the problem at hand.
TechWorks is a company focused on developing innovative products and services. TechWorks is looking to take
advantage of computational thinking and adaptive design reuse to enable the creation of next-generation,
secure, super society, intelligent, autonomous business solutions. At TechWorks, a skilled team of engineers,
data scientists, and designers is on a mission to revolutionize society. Their goal is to create advanced and
secure autonomous business solutions. The team believes in the power of computational thinking and
adaptive design reuse for success.

Led by the CIO, the team gathered in a cutting-edge laboratory to tackle challenges in transportation, security,
40 2 • Computational Thinking and Design Reusability

and automation. They embraced computational thinking by applying algorithms and machine learning to
analyze data. Recognizing the efficiency of adaptive design reuse, the team explored successful projects like
robotics and self-driving cars for inspiration. These projects have become the foundation for their own
innovation. With minimal adjustments, they seamlessly integrate these building blocks into comprehensive
solutions such as self-driven cars that can smoothly navigate the city, drones that can monitor public spaces,
and robotics that automate tasks. The company plans to bring their vision of the future to life by transforming
cities into hubs of interconnected, intelligent systems. Knowing that innovation is a continuous process that
requires rapidly evolving solutions, the team faced challenges while implementing their initial prototype.
However, they are able to adapt their super society solutions using computational thinking and adaptive
design reuse to ensure that they stay ahead of technological advancements. TechWorks is a symbol of the
successful integration of forward-thinking strategies to create secure and technologically advanced super
society solutions.

2.1 Computational Thinking


Learning Objectives
By the end of this section, you will be able to:
• Define computational thinking
• Discuss computational thinking examples

This chapter presents key aspects of computational thinking, including logical thinking, assessment,
decomposition, pattern recognition, abstraction, generalization, componentization, and automation. These
elements guide how computer scientists approach problems and create well-designed solution building blocks
at both the business and technical levels. Computational thinking often involves a bottom-up approach,
focusing on computing in smaller contexts, and seeks to generate innovative solutions by utilizing data
structures and algorithms. Additionally, it may make use of existing design building blocks like design patterns
and abstract data types to expedite the development of optimal combinations of data structures and
algorithms.

What Is Computational Thinking?


The problem-solving and cognitive process, known as computational thinking, is rooted in principles derived
from computer science. Be sure to retain key word tagging on computational thinking when sentence is
revised. It involves breaking down complex problems into smaller, more manageable parts and devising
systematic approaches to solve them. Complex problems are situations that are difficult because they involve
many different interrelated parts or factors. These problems can be hard to understand and often don’t have
simple solutions.

While “computational thinking” is still perceived by some as an abstract concept without a universally accepted
definition, its core value is to facilitate the application of separate strategies and tools to address complex
problems. In problem-solving, computers play a central role, but their effectiveness centers on a prior
comprehension of the problem and its potential solutions. Computational thinking serves as the bridge
between the problem and its resolution. It empowers solution designers to navigate the complexity of a given
problem, separate its components, and formulate possible solutions. These solutions, once developed, can be
communicated in a manner that is comprehensible to both computers and humans, adopting effective
problem-solving.

LINK TO LEARNING

Computational thinking serves as the intermediary that helps us read complex problems, formulate
solutions, and then express those solutions in a manner that computers, humans, or a collaboration of both

Access for free at openstax.org


2.1 • Computational Thinking 41

can implement. Read this article for a good introduction to computational thinking (https://openstax.org/r/
76CompThinking) from the BBC.

Vision
To further qualify computational thinking, Al Aho of the Columbia University Computer Science Department
describes computational thinking as “the thought processes involved in formulating problems so their
solutions can be represented as computational steps and algorithms.” Jeannette Wing, also of Columbia
University, brought the idea of computational thinking to prominence in a paper she wrote in 2006 while at
Carnegie Mellon University. She believes that computational thinking details the mental acts needed to
compute a solution to a problem either by human actions or machine. Computational thinking encompasses a
collection of methods and approaches for resolving (and acquiring the skills to resolve) complex challenges,
closely aligned with mathematical thinking through its utilization of abstraction, generalization, modeling, and
measurement (Figure 2.2). However, it differentiates itself by being more definitely aware than mathematics
alone in its capacity for computation and the potential advantages it offers.

Figure 2.2 This diagram illustrates the main components of computational thinking. (attribution: Copyright Rice University,
OpenStax, under CC BY 4.0 license)

Critical thinking is an important skill that can help with computational thinking. It boils down to understanding
concepts rather than just mastering technical details for using software, prioritizing comprehension over rote
learning. It’s a core skill, not an extra burden on a curriculum checklist, and it uniquely involves humans, not
computers, blending problem-solving and critical thinking. Critical thinking focuses on ideas, not tangible
items, applying advanced thinking to devise solutions. Critical thinking is essential for everyone and is,
comparable to foundational abilities like reading, writing, and arithmetic.
42 2 • Computational Thinking and Design Reusability

Computational Thinking Concepts


The description provided by the International Society for Technology in Education (ISTE) outlines the key
components and dispositions of computational thinking. Let’s explore each characteristic in more detail:

• Decomposition: The analytical process of breaking down complex concepts or problems into smaller parts
is called decomposition. This approach helps analyze and solve problems more effectively.
• Pattern recognition (logically organizing and analyzing data): Computational thinking emphasizes the
logical organization and analysis of data. This includes the ability to structure information in a way that
facilitates effective problem-solving.
• Representing data through abstractions: An abstraction is a simplified representation of complex systems
or phenomena. Computational thinking involves representing data through an abstraction, such as a
simulation, which uses models as surrogates for real systems.
• Automation through algorithmic thinking: Using a program or computer application to perform repetitive
tasks or calculations is considered automation.
• Identification, analysis, and implementation of solutions: Computational thinking emphasizes
identification, analysis, and implementation of potential solutions to achieve optimal efficiency and
effectiveness through a combination of steps and resources.
• Generalization and transferability: Generalizing and transferring this problem-solving process across a
wide variety of problems showcases the adaptability and applicability of computational thinking.

These abilities are supported and enriched by fundamental abilities integral to computational thinking. These
abilities involve the following characteristics: confidence in navigating complexity, resilience in tackling
challenging problems, an acceptance of ambiguity, adeptness in addressing open-ended issues, and
proficiency in collaborating with others to attain shared objectives or solutions. Another illustration of
computational thinking is the three As, which is organized into three phases, as visualized in Figure 2.3:

1. Abstraction: The initial step involves problem formulation.


2. Automation: Next, the focus shifts to expressing the solution.
3. Analysis: Finally, the process encompasses solution execution and evaluation.

Access for free at openstax.org


2.1 • Computational Thinking 43

Figure 2.3 The three As—abstraction, automation, analysis—illustrate the power of computational thinking. (credit photo:
modification of “Avalanche on Everest” by Chagai/Wikimedia Commons, Public Domain; credit graph: modification of “Slope stability
calculation for a model landslide” by B. Terhost and Bodo Damm/Journal of Geological Research, CC BY)

Computational Thinking Techniques


In today’s technology world, mastering computational thinking techniques is important. These techniques
offer a systematic way to solve problems using tools like data structures, which are like containers used to
organize and store data efficiently in a computer. They define how data is logically arranged and manipulated,
making it easier to access and work with information in algorithms and programs. There are four key
techniques (cornerstones) to computational thinking, as illustrated in Figure 2.4:

• Decomposition is a fundamental concept in computational thinking, representing the process of


systematically breaking down a complex problem or system into smaller, more manageable parts or
subproblems. By breaking down complexity into simpler elements, decomposition promotes a more
organized approach to problem-solving.
• Logical thinking and pattern recognition is a computational thinking technique that involves the process of
identifying similarities among and within problems. This computational thinking technique emphasizes
the ability to recognize recurring structures, relationships, or sequences in various problem-solving
situations.
• Abstraction is a computational thinking technique that centers on focusing on important information
while ignoring irrelevant details. This technique enables a clearer understanding of the core issues.
• Algorithms are like detailed sets of instructions for solving a problem step-by-step. They help break down
complex tasks into manageable actions, ensuring a clear path to problem-solving.
44 2 • Computational Thinking and Design Reusability

Figure 2.4 Users can explore the essence of computational thinking through decomposition, logical thinking, abstraction, and
algorithms. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

In addition to the four techniques, computational thinking involves essential steps such as testing and
debugging. Testing is crucial for uncovering errors within the step-by-step instructions or algorithms
employed to tackle a problem. On the other hand, debugging entails identifying and rectifying issues within
the code.

A programmer is someone who writes instructions for a computer to follow. A typical example is that of a
programmer who gives instructions to a robot and tells it to make a jam sandwich. In this case, applying
computational techniques to give instructions to the robot entails the following techniques: decomposition,
logical thinking and pattern recognition, abstraction, and algorithms. These techniques are explained in the
following subsections as they apply to the jam sandwich example.

TECHNOLOGY IN EVERYDAY LIFE

Traffic Accident Data


Analyzing data involves collecting and cleaning information, exploring patterns through visual and
statistical methods, and forming hypotheses. Statistical analysis and visualization are used to draw
conclusions, and findings are interpreted and communicated in reports or presentations to help in the
process of decision-making. Analyze the patterns and trends in traffic accident data to understand the
prevalence of road injuries and fatalities, and examine the progression of traffic incidents over time. To
enhance road safety measures and policies, you should apply computational thinking skills to identify
recurring patterns and abstract the most crucial information from the data. By extracting valuable insights,
you can contribute to the development and refinement of strategies that effectively improve road safety.

Decomposition
Decomposition involves solving a complex problem by breaking it up into smaller, more manageable tasks.
Decomposition enables the consideration of various components essential for solving a seemingly complex
task, allowing it to be redefined into a more manageable problem. In the case of the jam sandwich example,
decomposition involves identifying all the required ingredients and the steps the robot must take to
successfully create a jam sandwich.

Logical Thinking and Pattern Recognition


Pattern recognition makes it possible to group all the different features considered in decomposition into

Access for free at openstax.org


2.1 • Computational Thinking 45

categories. In the jam sandwich example, pattern recognition leads to grouping the various things identified
via decomposition into categories, in this case, ingredients, equipment, and actions. Therefore, applying
decomposition and pattern recognition will lead to thinking of as many things as possible that are required to
make a jam sandwich. The more things that can be thought of (i.e., ingredients, equipment, and actions), the
clearer the instructions will be. A first attempt at decomposition and pattern recognition is summarized in
Table 2.1.

Ingredients Equipment Actions

Bread Plate Repeat x times

Jam Knife Left hand (LH)

Butter Right hand (RH)

Pick up

Unscrew

Table 2.1 Logical Thinking and Pattern Recognition


Example The jam sandwich pattern recognition defines
the ingredients, equipment, and actions needed for
completion.

The process of identifying patterns typically requires logical thinking such as inductive or deductive reasoning.
Inductive reasoning makes it possible to go from specific examples to general principles. For example,
recognizing that dividing any number by 1 results in the original number leads to the broader conclusion that
holds true for any number. Similarly, understanding that the sum of two odd numbers yields an even number
leads to the generalization that adding two odd numbers always results in an even number. Inductive
reasoning turns an observation into a pattern, which allows making a tentative hypothesis that can be turned
into a theory. Deductive reasoning is the process of drawing valid conclusions from premises given the fact
that it is not possible for the premises to be true and the conclusion to be false. A traditional example
illustrates how the premises “all men are mortal” and “Socrates is a man” lead to the deductively correct
conclusion that “Socrates is mortal.”

TECHNOLOGY IN EVERYDAY LIFE

Computational Thinking in Our Life


Computational thinking is a method of problem-solving that is extremely useful in everyday life. It involves
breaking down complex issues into manageable parts, identifying patterns, extracting essential
information, and devising systematic solutions. This process not only applies to technical fields, but also to
everyday situations.

For example, imagine someone trying to manage their monthly expenses within a tight budget. Here's how
you might apply computational thinking to this common problem of managing a monthly budget:

1. Decomposition: Break down the financial challenge into different categories such as rent, groceries,
utilities, and entertainment.
2. Pattern recognition: Analyze past spending to identify patterns.
3. Abstraction: Focus on key areas where costs can be reduced.
46 2 • Computational Thinking and Design Reusability

4. Algorithmic thinking: Develop a systematic approach to allocate monthly income.

By using computational thinking, you can manage your finances more effectively, ensuring they cover
essential costs while maximizing their savings.

Abstraction
Abstraction makes it possible to pull out the important details and identify principles that apply to other
problems or situations. When applying abstraction, it may be useful to write down some notes or draw
diagrams to help understand how to resolve the problem. In the jam sandwich example, abstraction means
forming an idea of what the sandwich should look like. To apply abstraction here, you would create a model or
draw a picture representing the final appearance of the jam sandwich once it is made. This simplifies the
details, providing a clearer image of the desired outcome. Simple tools like the Windows Paint program can be
used to do this, as shown in Figure 2.5.

Figure 2.5 This jam sandwich abstraction example illustrates what the final product should look like. (attribution: Copyright Rice
University, OpenStax, under CC BY 4.0 license)

In technology, data are represented at different levels of abstraction to simplify user interaction and manage
complex operations efficiently. Users interact with a web application through a straightforward interface, like
requesting help from a GenAI tool, without seeing the underlying complexity. This GenAI prompt is then
processed by the application’s logic, which validates and directs it appropriately, often invisibly to the user.
Finally, at the back end, the prompt is processed and a GenAI-generated response is provided. Each layer of
abstraction serves a separate role, making the entire process efficient for both the user and the system (Figure
2.6).

Access for free at openstax.org


2.1 • Computational Thinking 47

Figure 2.6 When using GenAI, a user interacts with the interface while the application processes the prompt with layers of
abstraction on the back end. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Algorithm
An algorithm is a sequence of steps/instructions that must be followed in a specific order to solve a problem.
Algorithms make it possible to describe a solution to a problem by writing down the instructions that are
required to solve the problem. Computer programs typically execute algorithms to perform certain tasks. In
the jam sandwich example, the algorithm technique is about writing instructions that the robot can follow to
make the jam sandwich. As you will learn in Chapter 3 Data Structures and Algorithms, algorithms are most
commonly written as either pseudocode or a flowchart. An outline of the logic of algorithms using a
combination of language and high-level programming concepts is called pseudocode. Each step is shown in a
clearly ordered, written structure. A flowchart clearly shows the flow and direction of decisions in a visual way
using a diagram. Either way is fine, and it is a matter of personal preference. Basic templates for the flowchart
and pseudocode are in Figure 2.7.

Figure 2.7 Pseudocode lists each step, while a flowchart visually outlines the process of decision-making. (attribution: Copyright Rice
University, OpenStax, under CC BY 4.0 license)

Writing algorithms requires practice. Not everyone likes butter in their jam sandwich. The robot needs a
method of making sure it adds or does not add butter, depending on preferences. It is therefore necessary to
account for the following steps in the pseudocode and flowchart:
48 2 • Computational Thinking and Design Reusability

1. Ask whether there should be butter on the bread.


2. Either spread butter on the bread,
3. Or, do not use butter.

These steps can be added as actions in the table previously shown and expressed as steps in the pseudocode
using programming keywords such as INPUT, OUTPUT, IF, THEN, ELSE, and START. The corresponding
instructions can then be converted into a flowchart using the symbols in Figure 2.8.

Figure 2.8 The symbols used in a flowchart are associated with their instructions. (attribution: Copyright Rice University, OpenStax,
under CC BY 4.0 license)

Algorithm Execution Model Patterns

Various patterns of execution models may be used to step through the instructions provided in an algorithm.
So far, we have only considered the traditional sequential (i.e., step-by-step) execution model for algorithm
instructions. However, it is also possible to leverage parallelism/concurrency and recursion as alternative
models to drive the execution of algorithms’ instructions.

Parallel/concurrent execution models are typically used to optimize algorithm execution efficiency. As an
example, if you and a friend are buying tickets for a movie and there are three independent lines, you may opt
for a parallel processing model of execution by having you and your friend join two separate lines to buy the
tickets. In that case, you are guaranteed to be able to obtain the tickets quicker assuming one of the lines
operating in parallel with the other ends up serving customers faster, which is most often the case. Note that
executing the same algorithm simultaneously on a computer may not be possible if you only have one central
processing unit (CPU) in your machine. In that case, you can simulate parallelism by having the operating
system running on the machine execute the two algorithms concurrently as separate tasks while sharing the
single processor resources. This approach is less efficient than true parallelism. More detail on the differences
between concurrency and parallelism will be provided in Chapter 4 Linguistic Realization of Algorithms: Low-
Level Programming Languages.

Recursive models of execution provide another elegant and effective alternative to the traditional sequential
model of execution. The problem-solving technique where a process calls itself in order to solve smaller
instances of the same problem is called recursion. It can be a powerful tool in programming because it allows
for elegant solutions to complex problems by breaking them down into smaller, more manageable parts. By
leveraging recursion, programmers can write concise and efficient code to solve a wide range of problems.

One of the key advantages of recursion is its ability to handle complex tasks with minimal code. Instead of
writing lengthy iterative loops to solve repetitive tasks, recursion allows programmers to define a process that
calls itself with modified input parameters, effectively reducing the amount of code needed. However, it’s
essential to be cautious when using recursion, as improper implementation can lead to stack overflow errors

Access for free at openstax.org


2.1 • Computational Thinking 49

due to excessive process calls. Programmers should ensure that recursive processes have proper base cases to
terminate the recursion and avoid infinite loops. Example:

#include <iostream>
using namespace std;

int recursiveSum (int x) {


// Base case
if (x == 0) {
return 0;
} else {
// Recursive step
return x + recursiveSum (x - 1);
}
}
int main() {
cout << recursiveSum (10);
// Answer is 55
return 0;
}

In this scenario, the process involves gradually adding values to the total variable as you iterate through a
loop. However, a different approach involves leveraging computational thinking to deconstruct the problem,
breaking it down into smaller subcomponents. This method tackles these subcomponents individually to
address the overarching issue. When these smaller parts represent scaled-down versions of the original
problem, recursion becomes a valuable tool.

In practical scenarios, recursion often manifests as a function, which is a set of commands that can be
repeatedly executed. It may accept an input and may return an output. The base case represents the function’s
most straightforward operation for a given input. To effectively implement recursion, two primary steps must
be followed: (a) identify the base case, and (b) outline the recursive steps. In the context of a recursive
function, when n is 0, the cumulative sum from 0 to 0 is intuitively 0, representing the most fundamental
subproblem of the main issue. Armed with this base case, you can commence crafting the initial part of the
function.

int recursiveSum (int x) {


// Base case
if (x == 0)
return 0;
}

Recursion operates through a process of simplification, progressively reducing the value of x until it meets the
base condition, where x equals 0. This technique presents an alternative method, offering a refined and
effective algorithmic solution for the current problem:

#include <iostream>
using namespace std;

int recursiveSum (int x) {


50 2 • Computational Thinking and Design Reusability

// Base case
if (x == 0) {
return 0;
}
else {
// Recursive step
return x + recursiveSum (x - 1);
}
}
int main() {
cout << recursiveSum(10); // Output will be the sum = 55.
return 0;
}

While it looks like recursion amounts to calling the same function repeatedly, it is only partially true, and you
should not think about it that way. What happens is much more than repeating the call of a function. It is more
useful to think of it as a chain of deferred operations. These deferred operations are not visible in your code or
your output—they are in memory. The program needs to hold them somehow, to be able to execute them at
the end. In fact, if you had not specified the base case, the recursive process would never end. Figure 2.9
illustrates a flowchart for an iterative solution that adds N numbers.

Figure 2.9 A flowchart represents an iterative solution for adding N numbers. (attribution: Copyright Rice University, OpenStax,
under CC BY 4.0 license)

Access for free at openstax.org


2.1 • Computational Thinking 51

CONCEPTS IN PRACTICE

Computational Thinking for Chess Problem-Solving


Computers can be used to help us solve problems. However, before a problem can be tackled, the problem
itself and how it could be solved need to be understood. Computational thinking transcends mere
programming; it doesn’t equate to thinking in the binary fashion of computers, as they fundamentally lack
the capacity for thought. Rather, while programming is the craft of instructing a computer on the actions to
execute, computational thinking empowers you to meticulously determine what those instructions should
be. Take, for instance, the strategic gameplay involved in chess. To excel in a chess match, a player must:

• Understand the unique movements and strategic values of each piece, recognizing how each can be
maneuvered to control the board.
• Visualize the board’s layout, identifying potential threats and opportunities, and planning moves
several steps ahead to secure an advantageous position.
• Recognize patterns from previous games, understanding common tactics and counters, to formulate a
robust, adaptable strategy.

In devising a winning strategy, computational thinking is the underpinning framework:

• The complex game is dissected into smaller, more manageable components (e.g., the function of each
chess piece, the state of the board)—this is decomposition.
• Attention is concentrated on pivotal elements that influence the game’s outcome, such as the
positioning of key pieces and the opponent’s tendencies, sidelining less critical factors—this is an
abstraction.
• Drawing from prior knowledge and experiences in similar scenarios, a step-by-step approach is
developed to navigate through the game—this is algorithmic thinking.

Should you venture into developing your own chess program or strategy, these are precisely the types of
considerations you would deliberate on and resolve before actual programming.

Testing and Debugging


Testing and debugging are techniques used to identify flaws in algorithms and defects in code to be able to
correct them. Test cases rely on providing specific input data to check whether a software program functions
correctly and meets its designed requirements. Test cases need to be identified to drive tests. If a test
associated with a test case fails, debugging needs to be conducted to identify the source of the problem and
correct it. In other words, debugging is about locating and fixing defects (i.e., bugs) in algorithms and
processes to make them behave as expected. In programming, everyone makes mistakes, they are part of the
learning process. The important thing is to identify the mistake and work out how to overcome it. There are
those who feel that the deepest learning takes place when mistakes are made.

In the jam sandwich algorithm, testing can be facilitated by taking turns to play the role of the programmer
who gives instructions as well as the robot. If you are a programmer, your job is to read out the instructions
and follow each step. You can choose to follow your pseudocode or your flowchart. Each instruction becomes a
test case, and the test succeeds if the robot can follow every instruction exactly and successfully. In the
alternative, you will need to debug the instruction to identify the source of the problem and correct it. Table
2.2 can be used to record problems encountered and improvements that need to be made.
52 2 • Computational Thinking and Design Reusability

Test Input Problem Expected Observed Improvement Responsible


Case Outcomes Outcomes User

1.

2.

Table 2.2 Sample Table for Recording Test Problems and Improvements

INDUSTRY SPOTLIGHT

DNA Sequencing
Computational thinking is important in every industry today. In DNA sequencing, computational thinking
helps manage the massive and complex data involved. It starts by breaking the large DNA sequence into
smaller pieces. Then, it involves identifying patterns or sequences within these pieces, which might indicate
genetic information like the presence of certain genes. The focus is on the most relevant parts of the
sequence, discarding unnecessary data to concentrate on potentially significant genetic regions. Finally,
refined algorithms process and reconstruct the original sequence to identify genetic variations. This
approach is used for efficiently handling massive datasets in DNA sequencing and extracting meaningful
insights. The parts of computational thinking (CT) can be identified and highlighted in the process of DNA
sequencing, a complex task within the field of genomics:

• Decomposition: Break down the DNA sequencing process into specific steps such as sample collection
and DNA extraction.
• Pattern recognition: Identify similarities in DNA sequences that could indicate genetic traits or
anomalies.
• Abstraction: Focus on the essential parts of the genetic information that are relevant for the study at
hand.
• Algorithms: Create step-by-step protocols for each part of the sequencing process.
• Logical thinking: Determine the most accurate methods for sequencing based on the type of sample
and the required depth of sequence analysis.
• Evaluation: Assess the quality and accuracy of the sequencing data obtained.
• Debugging: Identify issues that may arise during the sequencing process.

Practical Computational Thinking Examples


Here are different real-life scenarios of practical applications of computational thinking with suggested
solution approaches to provide problem-solving and decision-making:

• Organizing a city’s recycling program to maximize efficiency. How can you ensure the most effective
collection routes and times?
• Solution: Use a route optimization algorithm to analyze and plan the most efficient paths for collection
trucks, considering factors like distance and traffic patterns.

• Planning the layout of a community garden to optimize space and sunlight exposure for different plant
types. How do you decide where to plant each type of vegetable or flower?
• Solution: Employ a simulation algorithm that models sunlight patterns, plant growth rates, and space
requirements to design a garden layout that maximizes space and plant health.

• Creating a schedule for a multistage music festival to minimize overlaps and ensure a smooth flow of

Access for free at openstax.org


2.2 • Architecting Solutions with Adaptive Design Reuse in Mind 53

audiences. How do you schedule the performances across different stages?


• Solution: Implement a scheduling algorithm that considers audience preferences, artist availability, and
stage logistics to create a timetable that maximizes attendee satisfaction and minimizes conflicts.

• Determining the most efficient way to allocate computer resources in a cloud computing environment to
handle varying user demands. How do you manage the computational load?
• Solution: Use load balancing algorithms to distribute tasks across servers dynamically, ensuring optimal
resource utilization and maintaining system performance.

2.2 Architecting Solutions with Adaptive Design Reuse in Mind


Learning Objectives
By the end of this section, you will be able to:
• Describe how business solutions design heuristics and how patterns are used
• Discuss the role of enterprise architecture (EA) and related architecture domains
• Differentiate between enterprise and solution architecture

While computational thinking commonly employs a bottom-up strategy for crafting well-structured
components, adaptive design reuse adopts a top-down methodology, emphasizing the creation and assembly
of business solutions by combining existing design components. A design component is a reusable element
within a larger system that serves a specific purpose. These components, akin to low-level solution building
blocks, necessitate minimal modifications. This approach, often termed computing in the large, diverges from
individual algorithmic instructions. Instead of composing instructions for the execution of specific actions
through computational thinking’s decomposition, adaptive design reuse identifies fitting components based
on the articulated requirements of the business solution’s potential users, commonly referred to as
stakeholders. The method(s) used to identify these needs will be discussed in detail in Chapter 9 Software
Engineering. While adaptive design reuse typically considers algorithmic designs as building blocks, it uses
similar techniques (i.e., decomposition, logical thinking and pattern recognition, abstraction/generalization,
componentization, testing, and debugging) to identify and reuse building blocks.

The structural designs that stem from the top-down adaptive design reuse approach to business solution
design are referred to as business solutions architecture models. A business solution architecture is a
structural design that is meant to address the needs of prospective solution users. When a structural design
meets these needs, it is considered “architecturally sound” for the problem at hand. Furthermore, to accelerate
the putting together of complete solutions, the adaptive design reuse approach relies on architectural models
or component assemblies that were developed from previous designs. When the granularity of these
architectural models is at the level of subsystems, they are referred to as system family architectures or
architectural patterns. An architectural pattern is a reusable solution to a recurring problem in software
architecture design. Chapter 10 Enterprise and Solution Architectures Management provides more detail
about the various levels of patterns and how to organize them and bookkeep them into pattern catalogs to
facilitate reuse.

Business Solutions Design Patterns


Business solutions are strategies/systems created to solve specific challenges in a business. They aim to make
operations more efficient, improve decision-making, and contribute to overall success. Creating business
solutions is a multifaceted and intricate process, demanding knowledge in different technological domains
and the relevant business sector. A crucial step in this procedure is outlining the solution’s architecture, serving
as a master blueprint, and guiding the entire design and implementation process. A blueprint is a detailed
plan or design that outlines the structure, components, and specifications of a building, product, system, or
process.

Computational thinking and adaptive design reuse are simply methods used to bootstrap and/or accelerate
54 2 • Computational Thinking and Design Reusability

the design of business solutions by providing a comprehensive set of methods for developing software
solutions to business problems—for example, gathering solutions needs, building, and deploying turnkey
solutions. These solutions will be discussed in detail in Chapter 9 Software Engineering.

As explained earlier, the computational thinking and adaptive design reuse approach only provide high-level
techniques to help create or reuse components. As a result, one of the typical criticisms is that these
approaches are too vague, as it is sometimes not clear how they differ from other forms of thought. This is
why experts at applying these methods are usually seasoned architects who can instinctively recognize when
various types of components may be reused. When you ask Thoughtworks’ enterprise architecture expert
Martin Fowler why a particular business solution architecture is sound in a given context, he will often reply
that it simply “smells good” to him.

Rather than looking at computational thinking and adaptive design reuse as a best practice set of techniques,
it is best to consider them as process patterns that may be applied and adapted to help create a business
solution architecture model, which represents the content of the application architecture. Process patterns
may, in turn, leverage other process patterns that are more focused. When process patterns that are an
inherent part of the process of creating a work product become rules of thumb, they are referred to as
heuristics.

INDUSTRY SPOTLIGHT

Case Study: Making Online Shopping Easier with Smart Design


Imagine an online store called SwiftShop. They had a problem. Even though they had lots of great products
to buy, people were leaving their website without buying anything. It was like having a store full of
customers who walked in and out without buying anything! The business challenge is SwiftShop wanted to
make shopping on their website easier and more fun. They wanted people to stay longer, buy more, and
come back again and again. SwiftShop decided to use smart design tricks to fix their website and make it
super easy to shop. Here’s how they did it:

• User interface (UI) design patterns: Breadcrumb navigation: SwiftShop added breadcrumb trails so
shoppers could easily see where they were on the website and find their way back if they got lost.
• Progress indicators: During checkout, SwiftShop put in progress bars so shoppers knew how far along
they were in the buying process.
• Information architecture (IA) design patterns: Card sorting: SwiftShop asked real shoppers to help
organize their products. By sorting cards and asking people how they’d group things, SwiftShop made it
easier to find what customers were looking for.
• Faceted search: SwiftShop added filters so shoppers could narrow down their searches by attributes
like price, size, and brand.
• Interaction design (IxD) patterns: One-click purchase: To speed up the buying process, SwiftShop let
registered users buy with just one click.
• Personalized recommendations: SwiftShop used machine learning algorithms to suggest products that
customers might like based on what they’ve bought before.

• Results: After making these changes, SwiftShop saw great results:


◦ More people buying: With the new features, more people ended up buying products from SwiftShop.
◦ Happier shoppers: Customers spent more time on the website, which meant they liked shopping
there more.
◦ More repeat customers: Because it was easier to shop, people kept coming back to SwiftShop again
and again.

So, by using smart design tricks like breadcrumb navigation, progress indicators, card sorting, faceted

Access for free at openstax.org


2.2 • Architecting Solutions with Adaptive Design Reuse in Mind 55

search, one-click purchase, and personalized recommendations, SwiftShop made their online store a better
place to shop.

Layering and Componentization


Two heuristics are inherent to the design of business solutions and the creation of business solution
architectures. These heuristics are known as layering and componentization and can be thought of as design
approaches followed with an intent of architectural concerns.

Componentization has already been introduced as a computational thinking and adaptive design reuse
1
technique. Layering in business solution architecture involves creating distinct layers that abstract specific
aspects of the overall architecture. This heuristic enables the separation of concerns between layers and
improves modularity. Each layer comprises components, and a layer may consist of multiple components. This
hierarchical structure helps organize and separate different functionalities or responsibilities within the
architecture, making it easier to manage and understand.

The layering strategy is based on the concept of separating different concerns. This method structures
software design into distinct, stacked layers, with each layer tasked with specific functions. Commonly,
business solution architectures are organized into three main layers: the presentation, business logic, and data
management layers. The presentation layer is the user’s touchpoint, handling the user interface (UI) and
delivering the user experience (UX), which is the overall experience that a person has when interacting with a
product, service, or system. The business logic layer holds the business logic for the business solution or
application, separating the UI/UX from the business-centric computations. This separation affords the
flexibility to adjust business operations as needs evolve without disrupting other system components.
Although not tied to a specific domain, the data management layer is responsible for interacting with
persistent storage systems like databases and various data processing mechanisms. In a layered architecture,
information and commands flow through each layer, enhancing the design’s abstraction level, which denotes
the granularity or detail at which a system is represented. Despite its structured approach and abstraction
benefits, software designed in this layered manner might lean toward a monolithic build, potentially making
modifications challenging. Layered architecture can lead to tightly connected layers in software, making it
difficult to change one part without affecting others. This tight connection can also make it harder to update
or expand the software.

A monolithic structure is a type of system or application where all the parts are closely combined into one
single unit. This setup means that everything from the user interface to data handling and processing is
interconnected within one big software application. While this can make the system easier to set up and
manage initially, it also means that changing one part can require changes throughout the whole system,
making it less flexible. As illustrated in Figure 2.10, there are many more recent variations of layered
architectures that also provide complementary capabilities, such as tiered architectures, service-oriented
2
architectures (SOA), and microservices architectures.

1 Componentization was used initially in component-based business solution architectures that were derived from the Object
Management Group’s Object Management Architecture (OMA), including CORBA 3, JEE, DCOM, and COM+.
2 Per an article written by Thoughtworks Martin Fowler in 2014, the microservice architectural style is an approach to developing a
single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms,
often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated
deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different
programming languages and use different data storage technologies.
56 2 • Computational Thinking and Design Reusability

Figure 2.10 The different layered architectures include tiered architectures, service-oriented architectures (SOA), and microservices
architectures. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Detailed coverage of these specific architectures is not part of the scope of this chapter. They will be discussed
in more detail in multiple later chapters. It’s fundamental to understand that each software architecture model
is crafted to address the key challenges found in its preceding models. Being well-versed in various
architectural approaches equips you to devise a robust and effective architecture for your specific project.
While no software architecture can claim absolute perfection, an approach can be deemed highly suitable or
“relatively perfect” if it aligns well with the specific requirements and goals of your current project.

Enterprise-Level Architecture Domains


Enterprise-level architecture encompasses various domains that define the structure, components, and
operations of an entire organization. These domains provide a comprehensive framework for managing an
enterprise.

Up to this point, our efforts have been centered on employing the adaptive design reuse methodology to
construct business solution architectures tailored to individual projects. Within this scope, we aim to facilitate
the conversion of solution requirements into a cohesive solution concept, comprehensive business and IT
system outlines, and a collection of implementation activities, essentially forming the project plan. However,
because the adaptive design reuse approach is top-down, it is possible to start applying it at a higher level. The
enterprise level is typically the highest level of an organization, and it covers all strategic and tactical functions.
An enterprise often spans multiple organizations.

Enterprise architecture (EA) emerged at the beginning of the information age in the 1970s, and various
enterprise architecture frameworks (EAFs) were developed and used over time. The Open Group Architecture
Framework (TOGAF) is one such framework. According to TOGAF, an EA domain represents business, data,
application, and technology architectures. While specific frameworks may vary, Table 2.3 illustrates the
common enterprise-level architecture domains.

Access for free at openstax.org


2.2 • Architecting Solutions with Adaptive Design Reuse in Mind 57

Architecture Purpose Components

Defines the organization’s business


Business Business models, processes, capabilities, and
strategy, goals, processes, and
architecture organizational structure
functions

Information Data models, information flow diagrams, data


Manages data and information assets
architecture governance, data standards, and metadata

Application Designs and organizes software Application portfolio, application integration,


architecture applications and interface design

Technology Specifies the hardware, software, and Servers, networks, databases, cloud services,
architecture technology infrastructure security protocols

Security Ensures the protection of information Security policies, access controls, and
architecture assets, systems, and networks encryption mechanisms

Integration Facilitates seamless communication and Middleware, messaging systems, and


architecture data exchange integration patterns

Process Defines and optimizes business Process models, workflow diagrams, and
architecture processes performance metrics

Table 2.3 Common Enterprise-Level Architecture Domains

EA adopts a holistic perspective, treating the enterprise as a unified system or a system of systems. Its primary
aim is to integrate disparate processes, both manual and automated, into a cohesive environment that is
responsive to change and aligned with the enterprise’s business strategy. EA involves designing
comprehensive systems to support business processes, going beyond individual software or technology
systems. By standardizing and integrating fundamental processes across various business divisions, EA
facilitates enterprise-wide alignment and optimization of operations.

Solution architecture complements EA by tailoring specific system solutions to meet defined business and
technological needs. While EA focuses on defining the present situation and future goals of the organization,
solution architecture ensures the timely availability of suitable systems to fulfill business requirements.

TOGAF architectures promote the reuse of building blocks but lack a prescriptive method for managing them.
Adaptive design reuse becomes valuable in this context, involving understanding enterprise architectures and
adapting preexisting models and implementations. Detailed management of EA, solution architectures, and
leveraging EAFs is explored further in Chapter 10 Enterprise and Solution Architectures Management.

Enterprise Business Architecture and Model


The enterprise business architecture (EBA) is a comprehensive framework that defines the structure and
operation of an entire organization. It is related to corporate business and the documents and diagrams that
describe the architectural structure of that business. It characterizes the processes, organization, and location
aspects of the EBA at hand. EBAs usually consist of different business capabilities grouped into categories, as
illustrated in Figure 2.11. In this example, sample categories include product, sales, and service. The business
capabilities shown under the various categories typically have business models of their own that are part of
the overall organization’s business model. Note that it is clear from this diagram that the layering and
58 2 • Computational Thinking and Design Reusability

componentization heuristics play an important role in structuring capability models.

Figure 2.11 The EBA framework categories, including product, sales, and service, support the business organizations. (attribution:
Copyright Rice University, OpenStax, under CC BY 4.0 license)

Business architecture modeling helps extract the overall business model of an organization’s capability, which
includes the set of underlying processes necessary to operate the various capabilities of the organization.

A business model is a framework that outlines how a business creates, delivers, and captures value. It covers
the way a company generates revenue, identifies its target audience, and defines the products/services it
offers. The business model includes the organization, location, and process model, as described in the
following sections. It may include additional models, such as the financial model, which will not be covered
here. To illustrate what a business model entails practically, we will focus on a typical financial instruments
trading capability, which could be one of the capabilities in the overall business model of an organization
operating in the global markets industry. The organization, location, and process models are typically
represented at a logical level of abstraction, which is the most detailed representation for these types of
models. Note again that layering and componentization heuristics play an important role in structuring these
models.

Organizational Model

The organizational model is the structure and design of an organization, outlining how roles, responsibilities,
and relationships are defined. It provides a framework for understanding how various components within the
organization interact and collaborate to support its functions. In addition, it offers clear definitions of roles and
a matrix mapping of processes to roles.

Role definitions encapsulate a set of functional capabilities essential for an individual to manage one or several
distinct business processes, as shown in Table 2.4. It is common for an individual to fulfill multiple roles. The
process/role matrix delineates which business roles are accountable for particular business processes.
Typically structured at the elementary business process level, this matrix can also be extended to more
overarching levels when beneficial.

Role Title Responsibilities Reporting

DBA (database Managing and monitoring the database


IT manager or VP of IT
administrator) activities

CIO (chief information CEO (chief executive


Overall IT leadership
officer) officer)

HR employee Human resources operations HR manager

Table 2.4 Organizational Roles

Access for free at openstax.org


2.2 • Architecting Solutions with Adaptive Design Reuse in Mind 59

In general, the role matrix is a visual representation or chart that outlines the various roles within an
organization and their respective responsibilities. It helps in clarifying and communicating the distribution of
tasks and authorities among different positions or departments.

In a financial instruments trading business model, various roles contribute to the overall functioning of the
organization, as shown in Table 2.5.

Title Roles and Responsibilities

• Manage trading portfolios


Traders • Develop strategies to maximize profits and manage risk

• Build and maintain client relationships


Sales team • Understand client needs and offer suitable products and services

• Process and settle trades


3
Back-office operations • Handle trade confirmation and reconciliation

• Monitor and manage risk exposure


Risk managers • Ensure compliance with risk management policies

• Provide legal advice and services


Legal advisors • Ensure compliance with laws and regulations

• Develop and maintain trading systems and IT infrastructure


Technology professionals • Ensure the security and efficiency of technology resources

• Manage recruitment, training, and employee relations


Human resources • Ensure the organization is staffed with skilled and motivated employees

Table 2.5 Roles and Responsibilities within a Financial Organization

In addition to these titles, there are two primary management layers: upper management and systems
management. Upper management is tasked with the supervision of various sectors within the organization,
encompassing numerous business units, operations departments, and other key areas. This tier includes roles
such as division managers, group managers, and risk managers, who collectively ensure the smooth operation
and strategic direction of the business. On the other hand, systems management is dedicated to the routine
functioning of the trading system. This includes positions like site manager, systems maintenance manager,
content manager, and help desk personnel, all of which collaborate daily to ensure the trading system is
operational, well-maintained, and user-friendly.

Process Model

The business process is a series of interrelated tasks, activities, or steps performed in a coordinated manner
within an organization to achieve a specific business goal. The business process model can be encapsulated
within a framework of business process hierarchies. These hierarchies visually illustrate the outcome of
3 Often referred to as “back-office personnel.” A distinction may be necessary between back-office and front-office functionality.
Support—Front-office operations: Typically work alongside traders and salespeople. They facilitate telephone calls, process trades,
and resolve trading and sales/customer issues. Front-office personnel would consist of assistant traders and salespeople.
60 2 • Computational Thinking and Design Reusability

breaking down complex processes, a method known as process decomposition. This decomposition is carried
out until it reaches the elementary business process (EBP), which is a fundamental and indivisible activity
within a business that is not further subdivided into smaller processes (i.e., the smallest unit of business
activity). Figure 2.12 illustrates an example of business process modeling for creating an order.

Figure 2.12 This flowchart outlines the business process model for creating an order, checking availability, shipping, and payments.
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

EBPs possess distinct characteristics that make them noteworthy in the context of business operations. They
are considered significant and candidates for in-depth analysis. EBPs are typically associated with the actions
of a single user, emphasizing a focused responsibility within the organization. Furthermore, these processes
may encompass both manual and automated activities, reflecting the diverse nature of contemporary
business workflows. User involvement in EBPs is concentrated at a specific point in time, streamlining the
temporal aspect of these activities. The preservation of crucial business distinctions between processes
ensures clarity and coherence. Finally, EBPs aim to leave the conceptual entity model in a consistent state,
contributing to data integrity and maintaining a reliable representation of the organization’s entities and
relationships where the entity model represents the various objects or concepts and their relationships within
a system. A process map displays the order of chosen processes from the process hierarchies, highlighting
their connection to the roles responsible for executing them. A business process hierarchy organizes a
company’s activities from broad, general processes down to specific tasks, making it easier to manage and
improve how the business operates. Figure 2.13 illustrates a high-level business process hierarchy that could
be used for any manufacturing business process.

Access for free at openstax.org


2.2 • Architecting Solutions with Adaptive Design Reuse in Mind 61

Figure 2.13 The trading business process hierarchy includes activities such as customer management, product development, quality
control, order fulfillment, and customer payments. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

At the top of the hierarchy, there are three core business processes:

1. Customer management: This process plays a central role, interacting with various activities designed to
acquire, retain, and develop customer relationships. It focuses on all aspects of customer interaction.
There are three subsections as follows:
◦ 1.1. Target customer: Customers first discover a company and its offerings through various
marketing efforts. They then evaluate these offerings to decide if they meet their needs, leading to a
purchase decision. After buying, customers may require support, which is provided to ensure
satisfaction. A positive experience can turn these customers into loyal buyers and advocates for the
company and maintain strong customer relationships.
◦ 1.2. Lead team: This involves identifying potential leads and assessing their likelihood to purchase. It
includes:
▪ 1.2.1. Outreach: This includes reaching out to customers (current and new) with promotions and
advertisements.

◦ 1.3. Decision-making: This step focuses on the creation and evaluation of product prototypes to
ensure they meet the required standards before launch.

2. Product development: This encompasses the entire life cycle of a product from initial idea generation
through design and development to launch. There are four subsections:
◦ 2.1. Product design: This focuses on the creation and evaluation of product designs to ensure they
meet the required standards before testing.
◦ 2.2. Quality assurance: QA involves checking the quality of the design and matching it with the listed
requirements.
◦ 2.3. Define raw materials: Raw materials are the basic substances needed to make products. These
can be things taken from nature, like wood, oil, or metals, which are used in making everything from
buildings to clothes and food. The type of raw material used affects how products are made, their
quality, and how much they cost. It includes:
▪ 2.3.1. Managing materials: Managing materials well is important for businesses to make sure they
62 2 • Computational Thinking and Design Reusability

have enough to meet demand, keep production running smoothly, and reduce waste, making the
whole process more efficient and sustainable.

◦ 2.4. Testing planning: Writing detailed instructions for setting up and conducting usability tests,
including how to select participants and record feedback, is important and includes:
▪ 2.4.1. Testing guidelines: Rules that help ensure a product or service works well and is safe before
it’s made available or used. These rules guide how to test the item, what tools to use, and what
problems to look for, aiming to fix any issues found. The main goal is to make sure the product
does what it’s supposed to do and meets quality standards.
▪ 2.4.2. Quality guidelines: These are rules that help ensure products or services are consistently
good and meet customers’ needs. By following these guidelines, companies aim to make their
customers happy, reduce mistakes, and work more efficiently. This involves regularly checking
and improving how things are done to make sure they meet high standards.

3. Order to cash: This is a comprehensive business flow that starts when a customer places an order and
ends when the payment is received and recorded:
◦ 3.1. Order management: The process begins when a customer places an order through a chosen
method, such as an online platform, phone call, or sales representative.
◦ 3.2. Credit management: A credit check is performed to ensure the customer has the necessary
credit to make the purchase. If the customer passes the credit check, the order is approved for
processing. Otherwise, it may be held until payment is secured or canceled.
◦ 3.3. Fulfillment: The required products are allocated from inventory. If products are not available, this
may trigger a back-order or manufacturing process.
◦ 3.4. Payment: The customer makes a payment using one of the accepted payment methods.
▪ 3.4.1. Invoicing: This step will start if the customer pays with a purchase order (i.e., the customer
may use different payment methods such as credit card or money transfer). Once the order is
shipped, an invoice is generated and sent to the customer, detailing the amount paid and/or due
for the products or services provided.

◦ 3.5. Reporting: Reports are generated to analyze the sales volume, payment collection efficiency, and
customer buying patterns.

EBP definitions include brief explanations of the activities within each process. These descriptions, when
paired with the appropriate process flow, serve as a foundation for progressing to intricate design stages and
aid in the ongoing creation of functional specifications. Furthermore, they highlight any presumptions
established during the architectural development, delineate rules for decisions that require manual
intervention, and, where relevant, provide cross-referencing details to ensure compatibility with the functional
demands of the application being developed.

The assignment of numbers to EBPs mirrors their placement within hierarchical structures, enabling their
linkage to various work products throughout the business, organizational, and location-specific domains,
along with pertinent models.

Process map diagrams illustrate the order of chosen processes from the process hierarchies, focusing on their
connection to the roles responsible for executing them. The process maps specifically highlight key processes
that hold particular business importance. Although every process featured in a process map is also included in
the business process hierarchy, not all processes in the hierarchy are depicted in the process maps. Process
flow diagrams can depict various elements, such as the business events triggering a process, outcomes of
completing a process, the processes themselves, the sequence of process steps, interruptions in the process
flow, possibilities for repetition, choices, exclusivity among processes, and any additional notes. Figure 2.14
illustrates a process map diagram for an order to cash business process.

Access for free at openstax.org


2.2 • Architecting Solutions with Adaptive Design Reuse in Mind 63

Figure 2.14 The order to cash business process flows from order management through product fulfillment and payment.
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Figure 2.15 illustrates the process of managing orders within a business. It begins with order management,
where orders are received and verified for accuracy. Next is credit management, which assesses the
customer’s credit to ensure they can fulfill payment requirements. The process then moves to fulfillment,
where the order is prepared and shipped. Then, during payment, the customer is invoiced and payment is
processed. Invoicing is a detailed part of this step, where an invoice is generated and sent to the customer. The
final stage is reporting, where the business generates reports on sales and financial transactions. This process
ends once all steps are completed, ensuring each order is handled efficiently from start to finish.

Figure 2.15 This sample trading business model represents a process map diagram for the enter order process. (attribution:
Copyright Rice University, OpenStax, under CC BY 4.0 license)

Location Model

A location model refers to a set of rules used to analyze and make decisions related to the positioning of
entities, activities, or resources. The conceptual location model shows how business processes will be
distributed geographically. Within the location model, definitions of location types specify the identification of
locations based on both their type and general geographic area. Additionally, the location model contains a
matrix illustrating the relationship between processes and locations, indicating the specific location types
where each process takes place.
64 2 • Computational Thinking and Design Reusability

Enterprise Technical Architecture


The enterprise technology architecture (ETA) is a comprehensive framework that defines the structure,
components, and interrelationships of an organization’s technology systems to support its business processes
and objectives. In general, the ETA guides the development and support of an organization’s information
4
systems and technology infrastructure. Therefore, it characterizes the organization’s application, data, and
technology infrastructure architectures and describes their models. The technology (virtual infrastructure)
supports the execution of information systems services, which in turn support business functions and related
services. Application, data, and technology infrastructure architecture models may be described via blueprints
at different levels of abstraction, including presentation, conceptual, logical, and physical. Layering and
componentization heuristics play an important role in structuring these models.

The abstraction level indicates the extent to which a design is distanced from tangible technological details.
This level of abstraction can differ across various architectural fields, but four main levels are widely
recognized:

1. Presentation: At this level, architecture is simplified into a form to streamline the communication of
essential ideas, particularly to business executives. Distilling complex architectural diagrams into more
straightforward, high-level visuals ensures that the core messages are conveyed effectively.
2. Conceptual: This level offers a more structured view of architecture, deliberately omitting many
specifics. The rationale for not including certain details is that they may not be relevant to the diagram’s
intended purpose or are yet to be decided.
3. Logical: The architecture is depicted in detail but remains uncertain of specific technologies. It aims to
give a full outline without committing to the particular technologies that will be employed for the final
build.
4. Physical: This level focuses on the actual technological specifics used or to be used in the architecture’s
realization, making it the most concrete representation of the four.

CONCEPTS IN PRACTICE

Manufacturing Industry
Business operations are commonly articulated through a value chain framework, a concept originating
from manufacturing practices. Analogous to the manufacturing industry where raw materials like steel are
transformed into various components, each step in the value chain plays a role in creating a finished
product, such as a car. In our approach, we’ve utilized the componentization heuristic to break down
business operations into distinct steps constituting the elements of a value chain. Additionally, the layering
heuristic has been applied to structure the value chain as layers of elements. Each layer in this model relies
on the output of the preceding layer to perform its specific function. This methodology enhances the
understanding of how value is sequentially added throughout the business process.

Application Architecture Model

The application architecture is a subset of the enterprise solution architecture that includes a process to
architect and design the application architecture as well as the actual application architecture model, which
represents the content of the application architecture. Application architecture helps organizations decide how
to invest in new software and systems. It makes sure that any new software being looked at, designed, or put
into use can work well with the systems the organization already has. This includes brand-new software,
updates to old software, making old software better, and buying and updating software packages.

4 Information systems architectures combine the use of application and data architecture details and specify control and data flow
necessary to operate the systems.

Access for free at openstax.org


2.2 • Architecting Solutions with Adaptive Design Reuse in Mind 65

Figure 2.16 illustrates a conceptual application architecture model of the sample trading business model that
was introduced earlier in this section. The goal of a conceptual application architecture is to demonstrate how
members of the organization interact with the application architecture from different locations when invoking
business processes. This example illustrates three distinct offices accommodating a diverse range of users,
including management, trading, IT, and others.

Figure 2.16 A conceptual trading application architecture moves through various levels, including users, functions, enterprise
services, and third-party systems. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Figure 2.17 takes the conceptual trading application architecture a bit further and describes what one of the
alternative application architectures may look like at the logical level. The goal of the logical application
architecture is to identify the various functional blocks within the architecture without getting into the details
of how they connect with and/or use each other.
66 2 • Computational Thinking and Design Reusability

Figure 2.17 The logical trading application architecture illustrates each function block for the processing of customer orders.
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

THINK IT THROUGH

Enterprise Business Architecture


What is the mechanism that prompts the creation of organization, location, and process domains to
describe an (enterprise) business architecture? How does decomposition help describe (enterprise)
technology architectures?

Hint: The organization domain focuses on the structure, roles, responsibilities, and relationships within the
enterprise. The location domain deals with the physical or virtual places where business activities take
place. The process domain delves into the business processes that drive the value chain and operational
activities.

The process involves dividing a system or architecture into smaller, more discrete elements, which aids in
analysis, understanding, and effective communication. Technology architectures in enterprises can be
highly complex, involving numerous components and interdependencies. Decomposition simplifies this
complexity by breaking down the architecture into smaller, comprehensible pieces. Each component can
then be analyzed and understood independently.

Figure 2.18 is a simplified view of the logical trading application architecture model that provides callouts to
explain what the various functions are meant to accomplish. It is good practice to document the functions in
that way. It is also good practice to provide an additional diagram (not included here) that includes callouts
identifying the actual third-party technologies that are used to implement the various functions.

Access for free at openstax.org


2.2 • Architecting Solutions with Adaptive Design Reuse in Mind 67

Figure 2.18 This logical trading application architecture includes function callouts that identify the technologies used for this
process. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Data Architecture Model

A data architecture model is a conceptual framework that outlines how an organization structures,
organizes, and manages its data assets. Data architecture forms a component of the broader enterprise
solution architecture. It encompasses the design process for both information and actual data architecture
models, representing the content within the data architecture. This framework aids organizations in
strategically planning investments in data management solutions and associated systems. The evaluated,
designed, and delivered data management solutions must seamlessly coexist with established ones,
managing newly developed databases as well as legacy database extensions.

In general, information can be extracted from raw data, knowledge can be gleaned from information, and
wisdom can be obtained from knowledge. Most enterprises refer to the relationship between data,
information, knowledge, and wisdom as the pyramid of knowledge. A wealth of additional information related
to data management and the pyramid of knowledge is provided in Chapter 8 Data Management. Information
modeling is the process used to describe the metadata necessary to understand the data, processes, and rules
that are relevant to the enterprise, as illustrated in Figure 2.19.
68 2 • Computational Thinking and Design Reusability

Figure 2.19 Examining and learning from data is integral to an organization’s success, as outlined in this role of information
modeling figure. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

In the collaborative process of data modeling, IT and business stakeholders establish a shared understanding
of essential business terms, known as entities, which typically end up being represented as tables that contain
data in a relational database management system. This involves defining the attributes that characterize these
terms and establishing the relationships between them. The capability to uphold and document the data
model is integral to an organization’s capacity to address varied data acquisition needs across critical business
projects. In essence, a well-maintained data model serves as a foundational element for ensuring coherence
and effectiveness in managing diverse data requirements.

Figure 2.20 is an example of an enterprise-level conceptual data architecture for an insurance company. The
goal of the enterprise conceptual data architecture is to illustrate the various types of data repositories and
the way data is collected and managed within the enterprise.

Figure 2.20 The enterprise conceptual data architecture for a fictitious insurance company highlights the different ways in which
data is handled and managed. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Concerning the data that flows through the application architecture, there may be many types of data (e.g.,
relational and NoSQL) and various ways to represent the corresponding data architecture models. A relational
database organizes data into tables where each row represents a unique record and each column represents a
property of that record. It uses a language called SQL to manage and query the data. NoSQL databases are
designed to store and manage data that doesn’t fit neatly into tables and can handle various types of data
models like documents or graphs.

Access for free at openstax.org


2.2 • Architecting Solutions with Adaptive Design Reuse in Mind 69

INDUSTRY SPOTLIGHT

Design Reuse
Adaptive design reuse plays a crucial role across various industries today, including the health-care sector,
where its impact is profoundly significant. Understanding and leveraging adaptive design reuse in health
care can lead to innovative solutions for complex medical problems, enhancing patient care and treatment
outcomes. One prime example of its application is in the design of artificial valves that can replace natural
heart valves in people.

Heart valve disease is a condition where one or more valves in the heart do not function properly, leading
to disrupted blood flow. The traditional solution involves surgical replacement with prosthetic valves, which
can be derived from biological sources or made from synthetic materials. Adaptive design reuse in this
context refers to the innovative process of designing these prosthetic heart valves by repurposing existing
materials, technologies, and design principles from within or outside the medical field. This approach can
accelerate the development of more effective, durable, and safer heart valve replacements.

Infrastructure Architecture (Infrastructure Pillars)

Technology architecture is a fundamental component of enterprise architecture, supported by four main


pillars: compute, memory, storage, and network. It outlines the organization and functionality of an
enterprise’s solutions or system’s technology framework. This encompasses the configuration of client and
server hardware, the applications operating on this hardware, the services these applications provide, and the
protocols and networks facilitating communication between applications and hardware components. It’s
important to distinguish technology architecture from system architecture. The system architecture deals
with applications and data, how they are related to each other, and what business process they support
together. The technical architecture includes the software and hardware capabilities to fully enable
application and data services.

Refer to Figure 10.36 for an example of an enterprise-level conceptual technology architecture for a fictitious
company. The goal of the enterprise conceptual technology architecture is to illustrate the various types of
hardware components that are part of the enterprise infrastructure and the way they are laid out at a high
level.

GLOBAL ISSUES IN TECHNOLOGY

Design Reuse Broader Impacts


Focusing on reusing existing designs and technologies can save time and money, but it might also limit new
ideas and innovations because designers could stick too closely to what’s already been done. This approach
can have several effects:

• Socially, it might not meet the needs of all users, especially if the technology doesn’t consider different
cultures or lifestyles, leading to some people being left out.
• Ethically, there’s a question about fairness and whether technology serves everyone equally, as relying
on old designs may not address current or future challenges well.
• Environmentally, while using existing designs could reduce waste and save resources, it might also
keep using outdated, less eco-friendly technologies instead of developing cleaner, more efficient
options.
• Economically, countries that already have a lot of designs and technologies could get ahead because
they have more to reuse, making it harder for countries with fewer resources to catch up or compete.
70 2 • Computational Thinking and Design Reusability

While reusing designs has its benefits, it’s important to also think about these broader impacts and strive
for a balance between recycling old ideas and creating new ones to make sure technology keeps improving
in a way that’s good for everyone.

Figure 2.21 illustrates a possible physical architecture for the sample trading business model that was
introduced earlier in this section. This diagram depicts the layout of the actual hardware components that
make up the infrastructure of the trading solution. It also delineates where the functional blocks of the
application architecture are physically deployed. Note that this physical technology architecture leverages the
layout and components of the enterprise application architecture illustrated previously at the conceptual level.

Figure 2.21 The physical trading application architecture highlights the use of hardware to meet the organization’s goals.
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

An alternative physical application architecture for the trading solution is shown in Figure 2.22. The diagram
does not delineate where the functional blocks of the application architecture are physically deployed.

Access for free at openstax.org


2.2 • Architecting Solutions with Adaptive Design Reuse in Mind 71

Figure 2.22 This alternative physical trading application architecture outlines possible changes from the existing web solution.
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Information Systems Architecture View


Architecture views are representations of the overall system design that matter to different stakeholders. In
addition to the application, data, and technology architecture models, IT architects create specific views to
communicate and ensure that the system meets the needs of various stakeholders. An architecture is typically
conveyed through one or more architecture models that collectively offer a clear description of the system’s
structure. A singular, all-encompassing model is often too intricate to be easily grasped, displaying every
intricate relationship among diverse business and technical elements.

Just as an architect designs different aspects of a building, such as wiring diagrams for electricians, floor plans
for owners, and elevations for planners to address their unique needs, the architecture of an information
system is similarly broken down into various views for effective communication among its stakeholders. In the
IT world, for example, an architect might create specific views of the system’s physical layout and security
measures. These tailored views ensure that stakeholders, each with their own concerns and areas of focus,
have a clear understanding of the system’s components relevant to their interests.

From Enterprise to Solution Architecture


After identifying business and technical characteristics through the diagrams discussed in the previous
section, solution models can be developed, and implementations can be created. This involves constructing
new components as necessary and combining them with reusable design components obtained from a
pattern catalog. If implementations of these reusable components already exist and can be customized, the
implementation of the solution becomes much faster. This approach helps avoid reinventing the wheel and
developing software components or systems that already exist, focusing instead on assembling existing
components for efficiency.

Breadth of Applicability of Models


Figure 2.23 illustrates the key dimensions for representing and categorizing architecture models,
72 2 • Computational Thinking and Design Reusability

accompanying diagrams, and related patterns. These architecture domains align with the TOGAF standard, as
discussed earlier. The diagram also illustrates the levels of abstraction to characterize various architectural
models. Additionally, it introduces the architecture scope as another dimension, classifying the models’
breadth of applicability at the enterprise, portfolio, or project level. The architecture scope is the extent and
boundaries within which architectural considerations, decisions, and solutions apply.

Figure 2.23 TOGAF architectural dimensions include various levels of abstraction. (attribution: Copyright Rice University, OpenStax,
under CC BY 4.0 license)

Portfolio- or domain-level architectures usually concentrate on collections of solutions and projects associated
with a specific business unit, like marketing or sales in a large organization. In contrast, project- or system-
level architecture is geared toward individual solutions and projects within those business units. Defining the

Access for free at openstax.org


2.3 • Evolving Architectures into Useable Products 73

scope of a model is crucial because there exists a direct relationship between the scope and the level of detail
achievable in a blueprint. This is due to the necessity for increased generalization, such as simplification,
feature selection, and grouping, as the blueprint’s scope expands.

TECHNOLOGY IN EVERYDAY LIFE

Adaptive Design Reuse: Eco-Friendly Homes from Recycled Materials


Adaptive design reuse can significantly benefit people in everyday life by promoting efficiency,
sustainability, and improved user experiences. Imagine a neighborhood called EcoHomes, where all the
houses are built using old materials from buildings that were taken down or left unused. It is all about
making new homes without needing to produce or buy more materials, which helps the environment. In
EcoHomes, architects and builders take things like bricks, glass, and wood from old sites and use them to
build new, modern houses. For example, wooden beams from an old barn become part of the living room
in a new house, adding a cool, old-time feel to a modern design. Windows from an old office building let in
lots of sunlight, cutting down on the need for electric lights. EcoHomes is a hit because it shows how
reusing building materials can save money and help the planet. The people living there have lower energy
bills and are proud of their unique, eco-friendly homes. This story shows how using what we already have in
new ways can make a big difference for our wallets and the world.

2.3 Evolving Architectures into Useable Products


Learning Objectives
By the end of this section, you will be able to:
• Analyze similarities between architectures and apply patterns
• Discuss how to accelerate the creation of applications

The combination of top-down, adaptive design reuse, and bottom-up, computational thinking, optimizes
modern software development. This blend allows software developers to find a middle ground by adapting
and assembling existing components, minimizing the need for developing entirely new software. A clear
example of this cooperation is evident in modern websites, where the Model-View-Controller architectural
pattern is widely employed. The Model-View-Controller (MVC) is a software architectural pattern commonly
used in the design of interactive applications, providing a systematic way to organize and structure code. The
pattern separates an application into three interconnected components: model, view, and controller. The
model represents the application’s data structure and business logic, managing data and rules. The view is
responsible for displaying the user interface; it shows data to the user and sends user commands to the
controller. The controller serves as an intermediary between the model and the view. It processes user input
received from the view, interacts with the model to retrieve or update data, and determines the appropriate
view for presenting the response. Many practical web application frameworks, such as Django, have already
implemented the MVC pattern. In this setup, the application is divided into three parts: the model handles the
data structure, the view displays the data on web pages, and the controller manages the business logic,
facilitating interaction between the model and the view. Adding a broker pattern to MVC architectures can
improve the system’s scalability and flexibility when applicable and/or necessary. The broker acts as a
middleman that manages communication between different parts of the application, helping to handle more
data and complex operations efficiently.

Leveraging these existing frameworks enables developers to concentrate on crafting the specific logic relevant
to the website rather than reinventing the wheel. The beauty of this approach lies in the ability to swiftly piece
together solutions by extending and adapting the available frameworks. By doing so, developers streamline
the development process, enhance efficiency, and capitalize on the collective wisdom embedded in proven
74 2 • Computational Thinking and Design Reusability

frameworks, thereby fostering innovation in a more focused and resource-efficient manner.

Leveraging Architectural Similarities and Applying Patterns


The adaptive design reuse approach is a strategy in software development that emphasizes the efficient reuse
of existing design solutions to create new systems or applications. The beauty of the adaptive design reuse
approach is that the business solution architecture model helps create abstract representations of real
systems. Therefore, if there exist tangible realizations of the various components that are part of these
abstract representations, it is possible to implement the model and create specialized running business
solutions from it.

A solutions continuum is a strategy where existing solutions, components, or patterns are leveraged and
adapted for use in different contexts. Figure 2.24 illustrates the TOGAF model of reuse that is referred to as the
solutions continuum. As mentioned earlier, TOGAF does not provide a prescriptive approach to creating and/or
managing a catalog of patterns. However, various pattern repositories are available on the Internet and the
adaptive design technique can be used to avoid having to reinvent the wheel when architectural patterns and
related component implementations exist and can be customized and assembled with new components. More
information on this topic is provided in Chapter 10 Enterprise and Solution Architectures Management.

Figure 2.24 This TOGAF solutions continuum illustrates how each architecture guides and supports the others. (attribution:
Copyright Rice University, OpenStax, under CC BY 4.0 license)

As illustrated in Figure 2.25, the TOGAF solutions continuum offers a limited set of dimensions. It serves as a
guideline, and The Open Group allows interested parties to enhance the model by incorporating additional
dimensions that are relevant to their specific needs.

Access for free at openstax.org


2.3 • Evolving Architectures into Useable Products 75

Figure 2.25 The TOGAF architecture continuum can suggest extensions but may only be able to focus on one aspect at a time.
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Accelerating the Creation of Mainstream Business Solutions


To illustrate the power of architectural design and adaptive design reuse, various designs used in mainstream
business solutions are surveyed, followed by explanations as to how corresponding turnkey solutions can be
derived from these models. Several subsequent chapters of the book elaborate on building-related solutions.

CONCEPTS IN PRACTICE

Object Management Architecture (OMA)


Organizations like the Object Management Group (OMG) create foundational and common system
architectures that may be used across industries. An example is the Object Management Architecture
(OMA), which is a foundation for developing architectures as building blocks. It then elaborates in providing
Object Services, Horizontal Facilities, and Vertical Facilities as subcomponents to help classify common
system architectures that may be used to assemble a complete OMA-centric architecture. It is then the
responsibility of the various industries to establish standard architectures that may be leveraged by
organizations that operate in these industries. Finally, organizations benefit from being able to leverage
foundational, common systems and industry architectures to develop their own proprietary architectures.
Based on the models of the various architectures that organizations may use and assuming there exists
solutions for them, organizations can develop their own solutions faster by reusing and customizing
existing solution components instead of reinventing the wheel. This is actually how the TOGAF solution
continuum applies adaptive design reuse.

Responsive Web 2.0 Business Solutions


World Wide Web Consortium (W3C) is an international community that develops guidelines to ensure the
long-term growth and accessibility of the World Wide Web. Web 2.0 is the second generation of the World
Wide Web when we shift from static web pages to dynamic content. Web 3.0 is the third generation of the
World Wide Web and represents a vision for the future of the Internet characterized by advanced technologies.
Most modern websites rely on the Web 2.0 architectural model set forth by W3C. A sample logical application
architecture model is illustrated in Figure 2.26.
76 2 • Computational Thinking and Design Reusability

Figure 2.26 The logical application architecture of Microsoft Azure-hosted web applications allows for responsive web and mobile
solutions for users. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

In this case, the model leverages the various components available on the Microsoft Azure Cloud. Microsoft
Azure is a comprehensive cloud computing platform provided by Microsoft. Azure is designed to help
organizations build, deploy, and manage applications and services through a global network of data centers.
Azure provides streamlined development capabilities under its DevOps offering to make it very easy to develop
and quickly deploy websites on the Azure platform using mainstream web application frameworks (e.g.,
ASP.Net, PhP, Java). DevOps is an Agile Software Engineering tools-driven approach that focuses on developing
software and deploying it into operation.

Many of the support components required to support website implementations are readily available on
Microsoft Azure and other systems that provide reusable components for responsible web design. It is easy to
evolve the model shown below into a running website. A web application framework has built-in support for
architectural patterns that make it easy to extend the framework and use plug-ins to implement commercial-
grade websites in a reasonable amount of time. They also support the use of web frameworks that make it
possible to build a responsive web application that makes the functionality available on the desktop version of
the application seamlessly available on a mobile device. In addition to these capabilities, the adaptive design
reuse approach may be used to create the custom part of the web application. More information related to the
development of web solutions is provided in Chapter 9 Software Engineering, Chapter 10 Enterprise and
Solution Architectures Management, and Chapter 11 Web Applications Development.

THINK IT THROUGH

Architectural Similarities
What is one of the mechanisms that makes it possible to compare architectural similarities between two
solutions at different levels?

Native Mobile Business Solutions


A web application (web app) is a software application that is accessed and interacted with through a web
browser over the Internet. Many web-based solutions leverage the inherent capabilities of mobile devices,
offering web apps tailored for various types of phones in addition to responsive websites. Numerous
frameworks exist to facilitate the development of native web apps, streamlining the process of creating

Access for free at openstax.org


2.3 • Evolving Architectures into Useable Products 77

applications that can run seamlessly on different mobile platforms. These frameworks often provide a unified
and efficient approach to building cross-platform mobile applications, ensuring a consistent user experience
across various devices.

In certain frameworks and development environments, React Native UI component libraries can be leveraged
to, port web apps to mobile devices. Examples include React Native support for Android apps using the
Android Studio (Android Studio provides a comprehensive environment for developing, testing, and
debugging Android apps) or iPhone web app using XCode IDEs (Xcode is an integrated development
environment [IDE] developed by Apple for macOS that offers a suite of tools for building software for Apple
platforms, including macOS, iOS, watchOS, and tvOS). Figure 2.27 illustrates the logical application architecture
of mobile web apps that use React Native. In addition to these capabilities, the adaptive design reuse
approach may be used to create the custom part of the native web app. More information related to the
development of native web app solutions is provided in Chapter 9 Software Engineering, Chapter 10 Enterprise
and Solution Architectures Management, and Chapter 11 Web Applications Development.

Figure 2.27 The logical application architecture of React Native mobile web apps shows the back-end processes that allow both
Android and IOS customers to use the same application. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Native Mobile Business Examples

Native mobile apps are designed specifically for mobile operating systems, providing optimal performance
and a seamless user experience.

• WhatsApp: WhatsApp is a native mobile app designed specifically for iOS and Android platforms. It directly
accesses the hardware of the device, such as the GPS, camera, and microphone, which allows for features
like real-time location sharing, voice and video calls, and media sharing.
• Instagram: Instagram is a photo- and video-sharing app. Native development helps Instagram manage
high-quality media content efficiently, apply real-time filters, and smoothly handle in-app animations.
• Uber Eats: Uber Eats is a food-delivery service that operates as a native app on mobile devices. Being
native allows the app to use device-specific features, such as GPS for tracking the delivery person’s
location in real time.
• Spotify: Spotify uses its native app to deliver personalized music and podcast streaming services. The app’s
native nature allows it to integrate closely with the device’s hardware, offering features like offline
downloading, low-latency streaming, and background play.

Web 3.0 Business Solutions


The secure and transparent way of recording transactions that uses a chain of blocks, each storing a list of
78 2 • Computational Thinking and Design Reusability

encrypted transactions is called blockchain. Once a block is full, it is linked to the previous one, forming a
chain. Blockchain technology decentralizes processing to ensure the integrity of transactions across multiple
computer nodes. This ensures that no single computer node gets assigned to processing transactions
repeatedly, thereby preventing possible fraudulent modifications of transactions. A smart contract is an
automated agreement written in code that runs on blockchain technology. They enforce contract terms
automatically when specific conditions are met, removing the need for intermediaries and ensuring security.
The use of blockchain smart contracts within web applications is becoming more popular. The logical
application architecture model in Figure 2.28 illustrates how this is made possible by creating hybrid Web 2.0
websites that interact with Web 3.0 smart contracts.

Figure 2.28 The flowchart show the logical application architecture of a Web 2.0 and Web 3.0 Website. (attribution: Copyright Rice
University, OpenStax, under CC BY 4.0 license)

Building these types of business solutions is greatly facilitated by the use of the Ethereum platform, an open-
source blockchain platform that enables the creation and execution of smart contracts and decentralized
applications, or Cloud blockchain platforms provided by one of the Cloud service providers such as Amazon
AWS, Google GCP, IBM Cloud, Consensys, Oracle Cloud, and others. These platforms provide frameworks and
APIs that make it easy to develop and deploy smart contracts. The Web 2.0 portion of the website can leverage
the frameworks mentioned earlier. In addition to these capabilities, the adaptive design reuse approach may
be used to create the custom part of the Web 3.0 application. More information related to the development of
Web 3.0 solutions is provided in Chapter 9 Software Engineering, Chapter 10 Enterprise and Solution
Architectures Management, and Chapter 13 Hybrid Multicloud Digital Solutions Development.

Access for free at openstax.org


2.3 • Evolving Architectures into Useable Products 79

Cloud-Native Business Solutions


A way of building software by breaking it into small, independent pieces where each piece, or service, does a
specific job and works on its own is called microservices. A large number of businesses have been migrating
their legacy business solutions to the cloud to take advantage of microservices that are designed around
specific business functions and can be deployed independently using automated deployment systems. Figure
2.29 illustrates how secure, managed, and monetized APIs that are critical for a digital enterprise can be
created by leveraging a combination of API-led integration frameworks and cloud-native technologies. The use
of such frameworks and technologies helps streamline the migration of legacy business solutions. The process
of migrating legacy business solutions means upgrading or replacing old systems with newer, more efficient
ones. In addition to these capabilities, the adaptive design reuse approach may be used to create the custom
part of the cloud-native applications. More information related to the development of cloud-native solutions is
provided in Chapter 9 Software Engineering, Chapter 10 Enterprise and Solution Architectures Management,
and Chapter 12 Cloud-Native Applications Development.

Figure 2.29 The cloud-native application architecture view of a digital enterprise shows both client-side and server-side processes
through each layer. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Figure 2.29 illustrates the architecture of a digital platform that combines web and mobile applications with
blockchain technology. On the client side, users interact with the platform through web dashboards and
mobile apps, which communicate with the server using JSON and handle notifications via services like
Firebase, Huawei, and Apple. The server side includes an API layer that processes these requests, a caching
layer to improve performance, and a back-end logic layer responsible for application logic, backups, and
analytics. The architecture also features integration with public blockchain networks for enhanced security and
transparency, and it supports various notification services to keep users informed.
80 2 • Computational Thinking and Design Reusability

GLOBAL ISSUES IN TECHNOLOGY

Scale Transformations
The approach that consists of reinventing, rethinking, and rewiring solutions in various industries seems to
favor countries that have the means to perform broadscale transformations. This may have ethical, social,
and economic implications in other parts of the world.

Consider how advanced countries are rapidly adopting electric cars. They have the resources to reinvent
transportation by investing in electric vehicle (EV) technology, rethinking their energy use to reduce
pollution, and rewiring their infrastructure to support EV charging stations. This shift toward electric cars is
more challenging in less wealthy countries due to the high costs of EVs and the lack of charging
infrastructure. As a result, these countries may continue to depend on older, more polluting vehicles, facing
both environmental and economic disadvantages.

Innovative Cloud Mashups


Innovative cloud mashups refer to creative combinations of different innovative business solutions that
leverage disruptive technologies such as IoT, big data analytics, machine learning, blockchain, and others that
can be quickly assembled today as hybrid cloud applications. A hybrid cloud application combines the
benefits of both private and public clouds, allowing organizations to optimize their infrastructure based on
specific requirements.

Internet of Things (IoT) refers to the network of physical devices embedded with sensors, software, and
connectivity, enabling them to collect and exchange data. The process of examining, processing, and
extracting valuable insights from large datasets is called big data analytics. Developing algorithms that
enable computers to learn from data and make decisions without explicit programming is called machine
learning. This is made possible by creating mashups of platform services provided by various public cloud
vendors to gain access to these disruptive technologies.

Figure 2.30 and Figure 2.31 illustrate models of solutions that are used today to support a variety of mobile
health (MHealth), body area networks (BANs), emotions monitoring, and social media applications. In addition
to the capabilities provided by the Big Clouds, the adaptive design reuse approach may be used to create the
custom part of these hybrid solutions. Google Maps and Zillow are prime examples of applications that utilize
location-based data to deliver valuable services. A GPS device identifies a user’s location, and that information
flows through the central network. Apps then display this data in a user-friendly manner, connecting users
with real-time geographic information in Google Maps or housing market details in Zillow. The integration of
GPS with other IoT systems allows for the seamless presentation of customized, location-specific content to
enhance the user experience. More information related to the development of web solutions is provided in
Chapter 9 Software Engineering, Chapter 10 Enterprise and Solution Architectures Management, and Chapter
13 Hybrid Multicloud Digital Solutions Development.

Access for free at openstax.org


2.3 • Evolving Architectures into Useable Products 81

Figure 2.30 IoT devices use sensors, applications, and connectivity to interact, collect, and exchange data. (attribution: Copyright
Rice University, OpenStax, under CC BY 4.0 license)

Figure 2.31 This diagram depicts an architecture of a body area network. (credit: modification of "Body Area Network" by
"Jtel"/Wikimedia Commons, Public Domain)

Web 3.0 enables businesses to create more personalized and predictive services for users, fostering greater
trust and engagement by giving users control over their own data. For companies, this translates to new
opportunities for collaboration, innovation, and reaching consumers directly without intermediaries, ultimately
82 2 • Computational Thinking and Design Reusability

driving more efficient business models and creating value in ways that were not possible with earlier web
technologies.

Access for free at openstax.org


2 • Chapter Review 83

Chapter Review
Key Terms
abstraction simplified representation of complex systems or phenomena
application architecture subset of the enterprise solution architecture; includes a process to architect and
design the application architecture as well as the actual application architecture model, which represents
the content of the application architecture
architectural pattern reusable solution to a recurring problem in software architecture design
architecture model represents the content of the application architecture
architecture scope extent and boundaries within which architectural considerations, decisions, and
solutions apply
automation using a program or computer application to perform repetitive tasks or calculations
big data analytics process of examining, processing, and extracting valuable insights from large datasets
blockchain secure and transparent way of recording transactions; uses a chain of blocks, each storing a list
of transactions
blueprint detailed plan or design that outlines the structure, components, and specifications of a building,
product, system, or process
business logic layer holds the business logic for the business solution or application
business model framework that outlines how a business creates, delivers, and captures value
business process hierarchy organizes a company’s activities from broad, general processes down to specific
tasks, making it easier to manage and improve how the business operates
computational thinking problem-solving and cognitive process rooted in principles derived from computer
science that involves breaking down complex problems into smaller, more manageable parts and devising
systematic approaches to solve them
data architecture model conceptual framework that outlines how an organization structures, organizes,
and manages its data assets
data management layer responsible for interacting with persistent storage systems like databases and
various data processing mechanisms
data modeling collaborative process wherein IT and business stakeholders establish a shared
understanding of essential business terms, known as entities, which typically end up being represented as
tables that contain data in a relational database management system
debugging finding and fixing of issues in code
decomposition solving of a complex problem by breaking it up into smaller, more manageable tasks
design component reusable element within a larger system that serves a specific purpose
EA domain represents business, data, application, and technology architectures
elementary business process (EBP) fundamental and indivisible activity within a business that is not further
subdivided into smaller processes
entity model represents the various objects or concepts and their relationships within a system
Ethereum platform open-source blockchain platform that enables the creation and execution of smart
contracts and decentralized applications
flowchart method for showing the flow and direction of decisions in a visual way using a diagram
function set of commands that can be repeatedly executed
heuristic form of a pattern that is well-known and considered a rule of thumb
hybrid cloud application combines the benefits of both private and public clouds, allowing organizations to
optimize their infrastructure based on specific requirements
Internet of Things (IoT) network of physical devices embedded with sensors, software, and connectivity,
enabling them to collect and exchange data
machine learning developing algorithms that enable computers to learn from data and make decisions
without programming
84 2 • Chapter Review

microservices way of building software by breaking it into small, independent pieces; each piece, or service,
does a specific job and works on its own
Microsoft Azure comprehensive cloud computing platform provided by Microsoft
migrating legacy business solutions upgrading or replacing old systems with newer, more efficient ones
Model-View-Controller (MVC) software architectural pattern commonly used in the design of interactive
applications, providing a systematic way to organize and structure code
monolithic structure system or application architecture where all the components are tightly integrated into
a single unit
presentation layer user’s touchpoint, handling the user interface (UI) and delivering the user experience
(UX), which encapsulates the overall feel and interaction a person has with a system or service
process map displays the order of chosen processes from the process hierarchies, highlighting their
connection to the roles responsible for executing them
pseudocode outline of the logic of algorithms using a combination of language and high-level programming
concepts
recursion programming and mathematical concept where a function calls itself during its execution
smart contract automated agreement written in code that runs on blockchain technology
solution architecture structural design that is meant to address the needs of prospective solution users
solutions continuum strategy where existing solutions, components, or patterns are leveraged and adapted
for use in different contexts
system architecture deals with application and data, and how they are related to each other and what
business process they support together
technical architecture includes the software and hardware capabilities to fully enable application and data
services
user experience (UX) overall experience that a person has when interacting with a product, service, or
system
Web 2.0 second generation of the World Wide Web when we shift from static web pages to dynamic content
Web 3.0 third generation of the World Wide Web and represents a vision for the future of the Internet
characterized by advanced technologies
web application (web app) software application that is accessed and interacted with through a web browser
over the Internet
web application framework built-in support for architectural patterns that make it easy to extend the
framework and use plug-ins to implement commercial-grade websites in a reasonable amount of time
World Wide Web Consortium (W3C) international community that develops guidelines to ensure the long-
term growth and accessibility of the World Wide Web

Summary
2.1 Computational Thinking
• Complex problems are situations that are difficult because they involve many different parts or factors.
• Computational thinking means breaking these problems into smaller parts, understanding how these
parts relate to each other, and then coming up with effective strategies or steps to solve each part.
• Computational thinking is a set of tools or strategies for solving (and learning how to solve) complex
problems that relate to mathematical thinking in its use of abstraction, decomposition, measurement, and
modeling.
• Characterization of computational thinking is the three As: abstraction, automation, and analysis.
• Decomposition is a fundamental concept in computational thinking, representing the process of
systematically breaking down a complex problem or system into smaller, more manageable parts or
subproblems.
• Logical thinking and pattern recognition are computational thinking techniques that involve the process of
identifying similarities among and within problems.
• Abstraction is a computational thinking technique that centers on focusing on important information

Access for free at openstax.org


2 • Chapter Review 85

while ignoring irrelevant details.


• Algorithms are detailed sets of instructions to solve a problem step-by-step.
• Testing and debugging is about finding and fixing mistakes in the step-by-step instructions or algorithms
used to solve a problem.

2.2 Architecting Solutions with Adaptive Design Reuse in Mind


• Computational thinking commonly employs a bottom-up strategy for crafting well-structured
components.
• A business solution architecture is a structural design that is meant to address the needs of prospective
solution users.
• Business solutions are strategies/systems created to solve specific challenges in a business. Designing
business solutions can be described as a complex systemic process that requires expertise in various
spheres of technology as well as the concerned business. A blueprint is a detailed plan or design that
outlines the structure, components, and specifications of a building, product, system, or process.
• Two heuristics are inherent to the design of business solutions and the creation of business solution
architectures. Layering in business solution architecture involves creating distinct layers that abstract
specific aspects of the overall architecture. The layering approach relies on the principle of separation of
concerns. The presentation layer holds the user interface (UI) that interacts with the outside world.
• User experience (UX) refers to the overall experience that a person has when interacting with a product,
service, or system.
• A monolithic structure is a system or application architecture where all the components are tightly
integrated into a single unit.
• Enterprise-level architecture encompasses various domains that define the structure, components, and
operations of an entire organization. Enterprise architecture (EA) views the enterprise as a system or a
system of systems.
• The enterprise business architecture (EBA) is a comprehensive framework that defines the structure and
operation of an entire organization. A business model is a framework that outlines how a business creates,
delivers, and captures value. The organizational model is the structure and design of an organization,
outlining how roles, responsibilities, and relationships are defined.
• The business process is a series of interrelated tasks, activities, or steps performed in a coordinated
manner within an organization to achieve a specific business goal.
• Location model refers to a set of rules used to analyze and make decisions related to the positioning of
entities, activities, or resources.
• The enterprise technology architecture (ETA) is a comprehensive framework that defines the structure,
components, and interrelationships of an organization’s technology systems to support its business
processes and objectives.
• The application architecture is a subset of the enterprise solution architecture.
• A data architecture model is a conceptual framework that outlines how an organization structures,
organizes, and manages its data assets.
• Data modeling is the collaborative process wherein IT and business stakeholders establish a shared
understanding of essential business terms, known as entities.
• Architecture views are representations of the overall system design that matter to different stakeholders.

2.3 Evolving Architectures into Useable Products


• The combination of top-down, adaptive design reuse and bottom-up, computational thinking optimizes
modern software development.
• Model-View-Controller (MVC) is a software architectural pattern commonly used in the design of
interactive applications, providing a systematic way to organize and structure code.
• The adaptive design reuse approach is a strategy in software development that emphasizes the efficient
reuse of existing design solutions to create new systems or applications.
• World Wide Web Consortium (W3C) is an international community that develops guidelines to ensure the
86 2 • Chapter Review

long-term growth and accessibility of the World Wide Web.


• Web 2.0 is the second generation of the World Wide Web when we shift from static web pages to dynamic
content.
• Web 3.0 is the third generation of the World Wide Web and represents a vision for the future of the
Internet characterized by advanced technologies.
• A web application (web app) refers to a software application that is accessed and interacted through a web
browser over the Internet.
• Blockchain is a secure and transparent way of recording transactions. It uses a chain of blocks, each
storing a list of transactions.
• Microservices is a way of building software by breaking it into small, independent pieces. Each piece, or
service, does a specific job and works on its own.
• Migrating legacy business solutions means upgrading or replacing old systems with newer, more efficient
ones.
• Innovative cloud mashups refer to creative combinations of different innovative business solutions that
leverage disruptive technologies.

Review Questions
1. What term is a problem-solving and cognitive process rooted in principles derived from computer science
that involves breaking down complex problems into smaller, more manageable parts and devising
systematic approaches to solve them?
a. abstraction
b. decomposition
c. computational thinking
d. recursion

2. What shape in a flowchart represents a decision point?


a. oval
b. parallelogram
c. rectangle
d. diamond

3. What does pseudocode spell out in natural language?


a. an algorithm
b. a test case to debug
c. a flowchart
d. the programming language of choice

4. After a test case fails, what is the next step to determine the cause of the failure?

5. What are the key elements of CT that distinguish CT from other types of problem-solving strategies?

6. What is the primary difference between a heuristic and a pattern?


a. A heuristic is a general rule used for quick problem-solving when an exact solution is not possible,
while a pattern is a repeatable solution to a commonly occurring problem.
b. A heuristic and a pattern are both specific guidelines used to achieve exact solutions in complex
problems.
c. A heuristic is used for creating new designs, whereas a pattern refers to repeating decorative
motifs.
d. There is no difference; both terms refer to specific scientific methods used in research.

7. What level of architecture is described as having a narrower scope, a detailed blueprint, and a lower level

Access for free at openstax.org


2 • Chapter Review 87

of abstraction?
a. system architecture
b. technical architecture
c. enterprise architecture
d. solution architecture

8. What level of architecture is described as having a wider scope, a vague plan for the entire organization,
and a higher level of abstraction?
a. system architecture
b. technical architecture
c. enterprise architecture
d. solution architecture

9. What component holds the business logic for the business solution or application?
a. presentation layer
b. data management layer
c. business logic layer
d. business process hierarchy

10. What is the difference between data, information, knowledge, and wisdom?

11. Explain why an information system architecture is considered an architecture view in TOGAF.

12. Once architectural similarities have been identified between the architecture of a new problem and
existing architectural solutions, what is required to apply these patterns effectively?
a. a comprehensive understanding of the new problem’s requirements and constraints
b. the ability to modify existing patterns to fit the new problem’s unique context
c. both a comprehensive understanding of the new problem’s requirements and the ability to modify
existing patterns
d. approval from a higher authority to use the identified patterns

13. In the Model-View-Controller, what layer is responsible for acting as an intermediary between two layers?
a. view
b. model
c. business logic
d. controller

14. What does Web 3.0 provide that Web 2.0 did not?
a. dynamic web pages as opposed to only static web pages
b. represents a vision for the future of the Internet characterized by advanced technologies
c. shifted from HTTP to HTTPS
d. based on JSON as opposed to HTML

15. A smart home with a thermostat, a refrigerator, and lights that all can be controlled remotely is an
example of devices that can be described with what terminology?
a. Internet of Things (IoT)
b. machine learning
c. hybrid cloud application
d. solutions continuum

16. Why doesn’t TOGAF provide prescriptive methods to create and manage repositories of architectural
88 2 • Chapter Review

patterns?

17. What is a responsive web application?

18. What is a cloud mashup?

Conceptual Questions
1. Suppose you plan to meet with your friends at a location you are unfamiliar with. In what ways could you
employ computational thinking to efficiently navigate and locate the meeting spot?

2. Explain how the pyramid of knowledge concept helps describe the learning progress you make when
reading a textbook.

3. What are specific examples of business architecture similarities between two banks?

4. What are specific examples of technology architecture similarities between two banks?

Practice Exercises
1. Think of a complex problem—one that can be broken into many layers of smaller problems. Explain how
computational thinking could help you develop a solution to your complex problem.

2. Look at the following pseudocode that describes an algorithm to make a peanut butter and jelly sandwich:
a. Get the peanut butter.
b. Get the jelly.
c. Get the bread.
d. Open the peanut butter jar.
e. Open the jelly jar.
f. Open the bread.
g. Take out slice of bread.
h. Take out another slice of bread.
i. Dip the knife into the peanut butter.
j. Spread the peanut butter on one slice of bread.
k. Dip the knife into the jelly.
l. Spread the jelly on the other slice of bread.
m. Put the two slices of bread together.

Write a new algorithm that utilizes abstraction to simplify the number of steps of the original algorithm
and can be used as a pattern to make any sandwich.

3. Research what the Fibonacci number sequence is. Write the pseudocode to compute the nth number in
the Fibonacci number sequence. Utilize recursion to model a pattern of computation.

4. Create a model that describes the business of running your daily life. Please note that this is not
suggesting that you should run your life as a business. Hint: To answer this question, think about the
various players, locations, and processes involved in your daily activities and create simple models that
mimic the structure provided for the trading business model in the current chapter section.

5. Draw an application architecture diagram for a business solution that uses smart contracts for payment
and transactions logging purposes. Feel free to leverage some of the figures from this chapter, rather than
create something new.

Problem Set A
1. Create an algorithm to explain to a robot how to cross a street. Use computational thinking to break down
the problem into smaller parts. Use the following information to guide your thinking.

Access for free at openstax.org


2 • Chapter Review 89

Task Decomposition Pattern Recognition Abstraction Algorithm

Identify the different Act out crossing the Write your


Vehicles, considerations you can road. Do you do instructions in
Crossing
actions, group together to form a something either
the road
decision pattern of what needs to be differently from pseudocode or
done. someone else? as a flowchart.

2. Create an algorithm to explain how to bake a four-tiered wedding cake.

3. Reflect on what happens when you try to figure out driving directions from point A to point B.

4. Create an enterprise architecture business model for an insurance company that specializes in insuring
home and car owners.

5. Create two alternative enterprise technology architecture models for the insurance company business
model created in the previous question.

6. Draw an application architecture diagram for the Web 2.0 responsive website of a fictitious insurance
company that focuses on home and car insurance and assume that the company also provides native apps
to its customers in addition to the website.

Problem Set B
1. Create an algorithm to explain to a robot how to play a game of rock paper scissors. Use computational
thinking to break down the problem into smaller parts. Use the following information to guide your
thinking.

Task Decomposition Pattern Recognition Abstraction Algorithm

Actions, Write your


Identify the different Play the game.
Rock, choices, instructions in
considerations you can group Think of the
paper, timings, either
together to form a pattern of actions you
scissors winning pseudocode or a
what needs to be done. perform.
conditions flowchart.

2. Create an algorithm to show a robot how to play a game of tic-tac-toe. Use computational thinking to
break down the problem into smaller parts. Use the following information to guide your thinking.

Task Decomposition Pattern Recognition Abstraction Algorithm

Write your
Moves that Identify the different Play against
Tic- instructions in
can be made, considerations you can group someone. What
tac- either
winning together to form a pattern of strategies do you
toe pseudocode or a
conditions what needs to be done. use in order to win?
flowchart.

3. Perform some research on the Internet to piece together enterprise architectures for as many industries
as you can think of.
90 2 • Chapter Review

4. Draw a cloud-native application architecture diagram for the trading business and technical model
documented in the previous section of this chapter.

5. A company wants to develop a business solution that takes pictures of the license plates of cars that drive
too fast through intersections in a given city, sends tickets to the drivers, and manages ticket payments.
Draw an innovative cloud mashup application architecture diagram for such a solution. Please note that
IoT, machine learning, and blockchain PaaS services should be used as part of your design.

6. Document the architecture of a pattern catalog that could be used to provide access to solution
architecture diagrams that would help accelerate the creation of mainstream business solutions.

Thought Provokers
1. Consider TechWorks, which is 100% committed to leveraging innovative technologies as a business growth
facilitator. Describe how it can best use computational thinking to create products or services that can
generate business. Give precise examples and explain how the start-up would be able to scale the
resulting business (i.e., keep sustaining the cost of doing business while increasing its number of
customers). Hint: Some companies leverage an incubation arm to come up with innovative ideas and then
accelerate the process of developing these ideas into practical solutions via a solution accelerator.

2. Consider our start-up company that is 100% committed to leveraging innovative technologies as a
business growth facilitator. Describe how it can best use adaptive design reuse to create products or
services that can generate business. Give precise examples and explain how the start-up would be able to
scale the resulting business (i.e., keep sustaining the cost of doing business while increasing its number of
customers)? Hint: The company may decide to sell reusable design models and their implementation from
a proprietary catalog; it may also focus on providing consulting services to derive complete solutions from
its proprietary models.

3. Consider our start-up company that is 100% committed to leveraging innovative technologies as a
business growth facilitator. Describe how it can best leverage evolving architectures into usable products
to create products or services that can generate business. Give precise examples and explain how the
start-up would be able to scale the resulting business (i.e., keep sustaining the cost of doing business
while increasing its number of customers).

Labs
1. Perform some research on the Internet to find examples of problem scenarios that computational
thinking may help solve and create a catalog of problem scenarios. Then, elaborate and show practically
how this catalog may be used to compare the scenarios and classify them so they may be used as part of
your pattern discovery as you apply computational thinking to new problem scenarios.

2. Create an enterprise architecture capability model for a company of your choice using your research from
problem set B. Then, expand one of the capabilities and provide business and technology architecture
models for it; identify a project within the capability you expanded upon and provide a complete solution
architecture for it.

3. Perform some research on the Internet to piece together additional solutions architecture diagrams for
the various categories of mainstream solutions covered in this chapter section. This should include
application, data, and technology diagrams.

4. Catalog additional types of solution architectures that may be used to accelerate the creation of
mainstream business solutions.

5. Apply critical thinking strategies to develop a study plan for your current semester’s courses, aiming to
achieve an A or pass each course.

Access for free at openstax.org


3
Data Structures and Algorithms

Figure 3.1 Online mapping applications represent places, locations, and map data while providing functionality to look around,
search for places, and get navigation directions. The right combination of data structures to manage collections of places, locations,
and map data along with efficient search and navigation algorithms will help optimize the experience of users trying to find their
way through the map and will also make optimal use of computing resources. (attribution: Copyright Rice University, OpenStax,
under CC BY 4.0 license; data source: OpenStreetMap under Open Database License; attribution: Copyright Rice University,
OpenStax, under CC BY 4.0 license)

Chapter Outline
3.1 Introduction to Data Structures and Algorithms
3.2 Algorithm Design and Discovery
3.3 Formal Properties of Algorithms
3.4 Algorithmic Paradigms
3.5 Sample Algorithms by Problem
3.6 Computer Science Theory

Introduction
Online maps help people navigate a rapidly changing world. It was not long ago that maps were on paper and
that knowledge came from non-digital, trusted sources. In this chapter, we will study how computer scientists
design and analyze the foundational structures behind many of today’s technologies. Data structures and
algorithms are not only foundational to map apps, but also enable an amazing variety of other technologies
too. From self-driving cars to inventory management to simulating the movement of galaxies to transferring
data between computers—all these applications use data structures and algorithms to efficiently organize and
process large amounts of information.

3.1 Introduction to Data Structures and Algorithms


Learning Objectives
By the end of this section, you will be able to:
• Understand the difference between algorithms and programs
• Relate data structures and abstract data types
• Select the data structure that is appropriate to solve a given problem practically
92 3 • Data Structures and Algorithms

Computer science is the study of computers and computational systems that involve data representation and
process automation. Owing to their historical roots as calculators, computers can easily represent numerical
data. Calculators rely on algorithms to add, subtract, multiply, and divide numbers. But what about more
complex data? How do computers represent complex objects like graphs, images, videos, or sentences? What
complications arise when we represent or process data in certain ways? These are some of the foundational
questions that computer scientists and programmers focus on when designing software and applications that
we use to solve problems.

A data type determines how computers process data by defining the possible values for data and the possible
functionality or operations on that data. For example, the integer data type is defined as values from a certain
range of positive or negative whole numbers with functionality including addition, subtraction, multiplication,
and division. The string data type is defined as a sequence of characters where each character can be a letter,
digit, punctuation, or space, with functionalities that include adding or deleting a character from a string,
concatenating strings, and comparing two strings based, for example, on their alphabetical order.

Data types like strings are an example of abstraction, the process of simplifying a concept in order to
represent it in a computer. The string data type takes a complex concept like a sentence and represents it in
terms of more basic data that a computer can work with. When a computer compares two strings, it is really
comparing the individual numerical character codes (see Chapter 5 Hardware Realizations of Algorithms:
Computer Systems Design) corresponding to each pair of characters within the two strings.

In this section, we will learn how to solve problems by choosing abstractions for complex data. We will see that
just as our data grows more complex, so do our algorithms.

Introduction to Algorithms
An algorithm is a sequence of precise instructions that operate on data. We can think of recipes, plans, or
instructions from our daily lives as examples of algorithms. Computers can only execute a finite pre-defined
set of instructions exactly as instructed, which is why programming can feel like such a different way of
communicating as compared to our natural human languages. A program is an implementation (realization)
of an algorithm written in a formal programming language.

Although each programming language is different from all the others, there are still common ideas across all
of them. Knowing just a few of these common ideas enables computer scientists to address a wide variety of
problems without having to start from scratch every single time. For example, the abstraction of string data
enables programmers to write programs that operate on human-readable letters, digits, punctuation, or
spaces without having to determine how to delve into each of these concepts. Programming languages allow
us to define abstractions for representing ideas in a computer (see Chapter 4 Linguistic Realization of
Algorithms: Low-Level Programming Languages for more).

The study of data structures and algorithms focuses on identifying what is known as a canonical algorithm: a
well-known algorithm that showcases design principles helpful across a wide variety of problems. In this
chapter, rather than focusing on the programming details, we will instead focus on algorithms and the ideas
behind them.

Understanding Data Structures


For many real-world problems, the ability to design an algorithm depends on how the data is represented. A
data structure is a complex data type with two equally important parts:

1. a specific representation or way of organizing a collection of more than one element, which is an
individual value or data point, and
2. a specific functionality or operations such as adding, retrieving, and removing elements.

In our previous example, a string is a data structure for representing sentences as a sequence of characters. It
has specific functionality such as character insertion or deletion, string concatenation, and string comparison.

Access for free at openstax.org


3.1 • Introduction to Data Structures and Algorithms 93

Although the values for complex data are often diverse, computer scientists have designed data structures so
that they can be reused for other problems. For example, rather than designing a specialized data structure
for sentences in every human language, we often use a single, universal string data structure to represent
sentences, including characters from different languages in the same sentence. (We will later see some
drawbacks of generalizing assumptions in the design of data structures and algorithms.) In addition,
computers take time to execute algorithms, so computer scientists are concerned about efficiency in terms of
how long an algorithm takes to compute a result.

Among the different types of universal data structures, computer scientists have found it helpful to categorize
data structures according to their functionality without considering their specific representation. An abstract
data type (ADT) consists of all data structures that share common functionality but differ in specific
representation.

Common abstract data types for complex data follow, and list and set types are shown in Figure 3.2. We will
discuss each abstract data type in more detail together with their data structure implementations.

• A list represents an ordered sequence of elements and allows adding, retrieving, and removing elements
from any position in the list. Lists are indexed because they allow access to elements by referring to the
element's index, which is the position or address for an element in the list. For example, a list can be used
to represent a to-do list, where each item in the list is the next task to be completed in chronological order.
• A set represents an unordered collection of unique elements and allows adding, retrieving, and removing
elements from the set. Sets typically offer less functionality than lists, but this reduction in functionality
allows for more efficient data structure representations. For example, a set can be used to represent the
names of all the places that you want to visit in the future.
• A map represents unordered associations between key-value pairs of elements, where each key can only
appear once in the map. A map is also known as a dictionary since each term (key) has an associated
definition (value). Maps are often used in combination with other data structures. For example, a map can
be used to represent a travel wish list: each place that you want to visit in the future can be associated
with the list of things that you want to do when you arrive at a given place.
• A priority queue represents a collection of elements where each element has an associated priority value.
In addition to adding elements, priority queues focus on retrieving and removing the element with the
highest priority. For example, a priority queue can be used to represent inpatient processing at a hospital
emergency room: the patients with more urgent need for care may be prioritized and dealt with first.
• A graph represents binary relations among a collection of entities. More specifically, the entities are
represented as vertices in the graph, and a directed or undirected edge is added between two vertices to
represent the presence or absence of a certain relation. For example, a friendship graph can be used to
represent the friendship relations between people, in which case an undirected edge is added between
two persons if they are friends. Graphs allow operations such as adding vertices and edges, removing
vertices and edges, and retrieving all edges adjacent to a given vertex.

Figure 3.2 Lists and sets are common abstract data types used to represent complex data and can be in the form of integers or
94 3 • Data Structures and Algorithms

string data. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Selecting a Data Structure


Since data representation is a fundamental task in designing algorithms that solve problems, how do we select
data structures for a particular problem? Computer scientists apply a top-down approach.

1. Select an appropriate abstract data type by analyzing the problem to determine the necessary
functionality and operations.
2. Select an appropriate data structure by quantifying the resource constraints (usually program running
time) for each operation.

The primary concern is the data and the operations to be performed on them. Thinking back to simple data
types like numbers, we focused on addition, subtraction, multiplication, and division as the basic operations.
Likewise, to represent complex data types, we also focus on the operations that will most directly support our
algorithms. After deciding on an abstract data type, we then choose a particular data structure that
implements the abstract data type.

Linear Data Structures


If a problem can be solved with an ordered sequence of elements (e.g., numbers, payroll records, or text
messages), the simplest approach might be to store them in a list. Some problems require that actions be
performed in a strict chronological order, such as processing items in the order that they arrive or in the
reverse order. In these situations, a linear data structure, which is a category of data structures where
elements are ordered in a line, is appropriate. There are two possible implementations for the list abstract data
type. The first, an array list (Figure 3.3), is a data structure that stores list elements next to each other in
memory. The other is a linked list (Figure 3.4), which is a list data structure that does not necessarily store list
elements next to each other, but instead works by maintaining, for each element, a link to the next element in
the list. Both array lists and linked lists are linear data structures because their elements are organized in a
line, one after the other. An advantage of array lists is that they allow (random) access to every element in the
list in a single step. This is in sharp contrast with linked lists, which only supports “sequential access.” On the
other hand, linked lists support fast insertion and deletion operations, which array lists do not.

Figure 3.3 An array list stores elements next to each other in memory in the exact list order. (attribution: Copyright Rice University,
OpenStax, under CC BY 4.0 license)

Figure 3.4 A linked list maintains a link for each element to the next element in the list. (attribution: Copyright Rice University,
OpenStax, under CC BY 4.0 license)

Earlier, we introduced sets as offering less functionality than lists. Both array lists and linked lists can also
implement the set abstract data type. Sets differ from lists in two ways: sets are unordered—so elements are
not assigned specific positions—and sets only consist of unique elements. In addition to implementing sets,
linear data structures can also implement the map, priority queue, and graph abstract data types. If linear data
structures can cover such a wide range of abstract data types, why learn other data structures? In theory, any
complex data can be represented with an array list or a linked list, although it may not be optimal, as we will
explain further.

One drawback of relying only on linear data structures is related to the concept of efficiency. Even if linear data

Access for free at openstax.org


3.1 • Introduction to Data Structures and Algorithms 95

structures can solve any problem, we might prefer more specialized data structures that can solve fewer
problems more efficiently, and help represent real world data arrangements more closely. This is particularly
useful when we have large amounts of data like places or roads in an online map of the entire world. Linear
data structures ultimately organize elements in a line, which is necessary for implementing lists but not
necessary for other abstract data types. Other data structures specialize in implementing sets, maps, and
priority queues by organizing elements in a hierarchy rather than in a line.

Tree Data Structures


A tree is a hierarchical data structure. While there are many kinds of tree data structures, all of them share the
same basic organizing structure: a node represents an element in a tree or graph. A node may or may not
have a descendant. A child node is a descendant of another node. Often, the primary node is referred to as
the “parent node.” Trees maintain a hierarchy through parent-child relationships, which repeat from the root
node at the top of the tree down to each leaf node, which is at the bottom of the tree and has no children. The
height of a tree corresponds to the depth of the hierarchy of descendants. Figure 3.5 illustrates the structure
and elements of a tree.

Figure 3.5 A tree is a hierarchical data structure with nodes where each node can have zero or more descendant child nodes.
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Binary Search Trees

A binary search tree is a kind of tree data structure often used to implement sets and maps with the binary
tree property, which requires that each node can have either zero, one, or two children, and the search tree
property, which requires that elements in the tree are organized least-to-greatest from left-to-right. In other
words, the values of all the elements of the left subtree of a node have a lesser value than that of the node.
Similarly, the values of all the elements of the right subtree of a node have a greater value than that of the
node. The search tree property suggests that when elements are read left-to-right in a search tree, we will get
the elements in sorted order. For numbers, we can compare and sort numbers by their numeric values. For
more complex data like words or sentences, we can compare and sort them in dictionary order. Binary search
trees use these intrinsic properties of data to organize elements in a searchable hierarchy (Figure 3.6).
96 3 • Data Structures and Algorithms

Figure 3.6 A binary search tree organizes elements least to greatest from left to right. (attribution: Copyright Rice University,
OpenStax, under CC BY 4.0 license)

The tree illustrated satisfies the binary tree property based on the natural alphabetical order between letters
since the elements in the tree are organized least to greatest from left to right. In other words, for a given list
of letters (A, B, C, D, E, F, G), start at the middle of the list of letters with D (the root node) then pick B as the left
sub-node of D which is at the middle of the list of letters (A, B, C) that is on the right of D and pick F as the right
sub-node of D, which is at the middle of the list letters (E, F, G) that is on the left of D. Finally, organize the
remaining letters under sub-nodes B and F to ensure that they are least-to-greatest from the left to right.

The search tree property is responsible for efficiency improvements over linear data structures. By storing
elements in a sorted order in the search tree rather than in an indexed order in a list, binary search trees can
more efficiently find a given element. Consider how we might look up words in a dictionary. A binary search
tree dictionary storing all the terms and their associated definitions can enable efficient search by starting at
the middle of the dictionary (the root node) before determining whether to go left or right based on whether
we expect our word to appear earlier or later in the dictionary order. If we repeat this process, we can
repeatedly rule out half of the remaining elements each time. Searching for a term in a list-based dictionary
that is not sorted, on the other hand, would require us to start from the beginning of the list and consider
every word until the end of the list since there is no underlying ordering structure to the elements.

Balanced Binary Search Trees

Binary search trees are not as effective as we have described. The dictionary example represents a best-case
scenario for binary search trees. We can only rule out half of the remaining elements each time if the binary
search tree is perfectly balanced, which means that for every node in the binary search tree, its left and right
subtrees contain the same number of elements. This is a strong requirement, since the order in which
elements are added to a binary search tree determines the shape of the tree. In other words, binary search
trees can easily become unbalanced. It is possible for a binary search tree to look exactly like a linked list, in
which each node contains either zero children or one child, which is no more efficient than a linear data
structure.

An AVL tree (named after its inventors, Adelson-Velsky and Landis) is a balanced binary search tree data
structure often used to implement sets or maps with one additional tree property: the AVL tree property,
which requires the left and right subtrees to be balanced at every node of the tree. AVL trees are just one
among many “self-balancing” binary search trees. A balanced binary search tree introduces additional
properties that ensure that the tree reorganizes elements to maintain balance (Figure 3.7).

Access for free at openstax.org


3.1 • Introduction to Data Structures and Algorithms 97

Figure 3.7 An AVL tree rotates nodes in a binary search tree to maintain balance. This sequence of steps illustrates the insertion of
numbers 1, 2, 3, 4, 5, 6 into an initially empty AVL tree. (The steps in which rotation occurs are represented by the solid black arrows.)
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Balanced binary search trees such as AVL trees represent just one approach for ensuring that the tree never
enters a worst-case situation. There are many other balanced binary search tree data structures in addition to
AVL trees. Balanced binary search trees can also be used to implement the priority queue abstract data type if
the elements are ordered according to their priority value. But balanced search trees are not the only way to
implement priority queues.

Binary Heaps

Priority queues focus on retrieving and removing the highest-priority elements first, adding an element to a
priority queue also involves specifying an associated priority value that is used to determine which elements
are served next. For example, patients in an emergency room might be served according to the severity of
their health concerns rather than according to arrival time. A binary heap is a type of binary tree data
structure that is also the most common implementation for the priority queue abstract data type (Figure 3.8).
A binary heap is not a search tree, but rather a hybrid data structure between a binary tree and an array list.
Data is stored as an array list in memory, but the binary heap helps visualize data in the same way that a
binary tree does, which makes it easier to understand how data are stored and manipulated. Binary heaps
organize elements according to the heap property, which requires that the priority value of each node in the
heap is greater than or equal to the priority values of its children. The heap property suggests that the
highest-priority element will always be the root node where it is efficient to access.
98 3 • Data Structures and Algorithms

Figure 3.8 A binary heap is the most common implementation of the priority queue abstract data type. The priority value of each
node in the binary heap is greater than or equal to the priority values of the children. Note that the value stored in the root node of
the right subtree can be smaller than the value stored in any node in the left subtree, while not violating the heap property.
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

CONCEPTS IN PRACTICE

Tracking Earthquakes
Earthquakes, hurricanes, tsunamis, and other natural disasters occur regularly, and often demand an
international response. How do we track natural disasters and identify the most affected areas in order to
coordinate relief and support efforts? In the United States, the U.S. Geological Survey (USGS) is responsible
for reporting earthquakes using thousands of earthquake sensors. However, that still leaves many places
without earthquake sensors. Outside the United States, sensor technology may be less robust or
inaccessible.

Social network data can be used to enhance this information and more quickly alert governments about
natural disasters in real-time. By monitoring public social network platforms for occurrences of short posts
such as “earthquake?,” we can quickly localize earthquakes based on the user’s geolocation data. However,
aggregating and understanding this data—often thousands of data points arriving in minutes—requires
efficient data structures and algorithms. We can use a binary heap that implements the priority queue
abstract data type for an earthquake-tracking program. For each “earthquake?” post received for a given
geolocation, we can increase the priority of the earthquake locations, which helps identify the likely-
earthquake location that is closest to the user’s real location. At any time, we can efficiently retrieve the
highest-priority element from the priority queue. By choosing to use a binary heap rather than a linear data
structure for implementing the priority queue, we can ensure that the earthquake-tracking program is able
to keep up with the thousands of posts made every minute during an earthquake.

Graph Data Structures


Both binary search trees and binary heap data structures represent more efficient ways to implement sets,
maps, and priority queues by organizing data according to their intrinsic properties. In both cases, the
properties of data enable efficient addition, retrieval, and removal of elements.

Graphs are a different kind of abstract data type. Rather than focusing on addition, retrieval, and removal,
graphs focus on explicitly modeling the relationships between elements. Graphs afford access not only to
elements, but also to relationships between elements.

• A vertex represents an element in a graph or a special type of it, such as a tree.


• An edge is the relationship between vertices or nodes. Optionally, edges can have associated weights. In a
graph abstract data type, the relationships between two vertices connected by an edge are considered
adjacent.

Access for free at openstax.org


3.1 • Introduction to Data Structures and Algorithms 99

LINK TO LEARNING

Visualgo is a website that provides animations to help users learn more about algorithms and data
structures. They have exercises to help understand several concepts presented in this chapter. You can
access animations on various data structures and algorithms (https://openstax.org/r/76Visualgo) such as a
linked list, a binary search tree, and graph structures.

In computer networks such as the Internet, graphs can represent individual network routers as nodes with
data packets flowing between directly connected routers along edges. Even though not every router is directly
connected to every other router, the router at the current node can analyze an incoming data packet to
determine which edge it should travel through next. By repeating this process, a data packet can travel from a
router on the Internet to another router on the Internet even though the two routers are not directly adjacent
to each other.

Graphs are unique in that they can directly represent a wide variety of real-world problems, such as the
following:

• A social network, where each vertex is a person, and each edge is a friendship.
• The Web, where each vertex is a webpage, and each edge is a link.
• A campus map, where each vertex is a building, and each edge is a footpath.
• A course prerequisite diagram, where each vertex is a course, and each edge is a prerequisite.

Unlike the list, set, map, and priority queue abstract data types, which have relatively standardized
functionality focusing on the addition, retrieval, and removal of elements, the graph abstract data type is much
less standardized. Typically, graph algorithm designers will create their own graph data type to represent a
problem. The corresponding graph problem can then be represented using or adapting a standard graph
algorithm. Unlike programming with other abstract data types, much of the hard work of solving a problem
with a graph occurs when programmers decide what the vertices and edges represent, and which graph
algorithm would be appropriate to solve the problem. They also must consider the consequences of how they
represent the problem.

GLOBAL ISSUES IN TECHNOLOGY

Contact Tracing
Epidemiology is the study of how infectious diseases spread across the world. Within epidemiology, contact
tracing attempts to identify confirmed cases of disease and limit its spread by tracing contacted people and
isolating them from further spreading the disease.

Graph data structures can help epidemiologists manage the data and people involved in contact tracing.
Imagine a graph where each vertex in a tracing graph represents a person, and each edge between two
people represents a possible contact. When a person receives a positive test result for contracting the
disease, healthcare professionals can identify all the people that they’ve been in contact with by tracing
through the graph.

In addition to improving public health through contact tracing, our imaginary graph can also represent a
history of the spread of the disease for epidemiologists to understand how the disease moves through
communities. For example, if each vertex includes identity characteristic data such as age, race, or gender,
epidemiologists can study which groups of people are most affected by the disease. This can then inform
the distribution of vaccines to assist the most impacted groups first.
100 3 • Data Structures and Algorithms

Complex Data
Now that we have seen several data structure implementations for abstract data, let us consider how these
data structures are used in practice. Recall that we compared calculators whose algorithms operated on
numbers with the idea of a computer whose algorithms operated on complex data. Data structures can be
used to represent complex data by modeling hierarchy and relationships.

We might represent an online retail store as a map associating each item with details such as an image, a brief
description, price, and the number of items in stock. This map makes some online storefront features easier to
implement than others. Given an item, this map makes it easy to retrieve the details associated with that item.
On the other hand, it is not so easy to sort items by price, sort items by popularity, or search for an item by
keywords in its description. All these features could be implemented with additional data structures. We can
combine multiple data structures together to implement these features. In computer science, we use database
systems (see Chapter 8 Data Management) that work behind the scenes in many applications to manage these
data structures and facilitate long-term storage and access to large amounts of data.

Just as calculators have algorithms for calculating numbers, computers have algorithms for computing
complex data. Data structures represent these complex data, and algorithms act on these data structures.

3.2 Algorithm Design and Discovery


Learning Objectives
By the end of this section, you will be able to:
• Understand the approach to solving algorithmic problems
• Explain how algorithm design patterns are used to solve new problems
• Describe how algorithms are analyzed

Our introduction to data structures focused primarily on representing complex data. But computer scientists
are also interested in designing algorithms for solving a wider variety of problems beyond storing and
retrieving data. For example, they may want to plan a route between a start location and an end location on a
map. Although every real-world problem is unique, computer scientists can use a general set of principles to
design solutions without needing to develop new algorithms from scratch. Just like how many data structures
can represent the same abstract data type, many different solutions exist to solve the same problem.

Algorithmic Problem Solving


An algorithm is a sequence of precise instructions that takes any input and computes the corresponding
output, while algorithmic problem-solving refers to a particular set of approaches and methods for
designing algorithms that draws on computing’s historical connections to the study of mathematical problem
solving. Early computer scientists were influenced by mathematical formalism and mathematical problem
solving. George Pólya’s 1945 book, How to Solve It, outlines a process for solving problems that begins with a
formal understanding of the problem and ends with a solution to the problem. As an algorithm's input size is
always finite, finding a solution to an algorithmic problem can always be accomplished by exhaustive search.
Therefore, the goal of algorithmic problem-solving, as opposed to mathematical problem solving, is to find an
“efficient” solution, either in terms of execution time (e.g., number of computer instructions) or space used
(e.g., computer memory size). Consequently, the study of algorithmic problem-solving emphasizes the formal
problem or task, with specific input data and output data corresponding to each input. There are many other
ways to solve problems with computers, but this mathematical approach remains the dominant approach in
the field. Here are a few well-known problems in computer science that we will explore later in this chapter.

A data structure problem is a computational problem involving the storage and retrieval of elements for
implementing abstract data types such as lists, sets, maps, and priority queues. These include:

• searching, or the problem of retrieving a target element from a collection of elements

Access for free at openstax.org


3.2 • Algorithm Design and Discovery 101

• sorting, or the problem of rearranging elements into a logical order


• hashing, or the problem of assigning a meaningful integer index for each object

A graph problem is a computational problem involving graphs that represent relationships between data.
These include:

• traversal, or the problem of exploring all the vertices in a graph


• minimum spanning tree is the problem of finding a lowest-cost way to connect all the vertices to each
other
• shortest path is the problem of finding the lowest-cost way to get from one vertex to another

A string problem is a computational problem involving text or information represented as a sequence of


characters. Examples include:

• matching, or the problem of searching for a text pattern within a document


• compression, or the problem of representing information using less data storage
• cryptography, or the problem of masking or obfuscating text to make it unintelligible

Modeling
Computer scientists focus on defining a problem model, often simply called a model, which is a simplified,
abstract representation of more complex real-world problems. They apply the algorithmic problem-solving
process mentioned previously to design algorithms when defining models. Algorithms model phenomena in
the same way that data structures implement abstract data types such as lists, sets, maps, priority queues, and
graphs. But unlike abstract data types, models are not necessarily purely abstract or mathematical concepts.
Models are often linked to humans and social phenomena. A medical system might want to decide which
drugs to administer to which patients, so the algorithm designer might decide to model patients as a complex
data type consisting of age, sex, weight, or other physical characteristics. Because models represent
abstractions, or simplifications of real phenomena, a model must emphasize some details over others. In the
case of the medical system, the algorithm designer emphasized physical characteristics of people that were
deemed important and chose to ignore other characteristics, such as political views, which were deemed less
important for the model.

If an algorithm is a solution to a problem, then the model is the frame through which the algorithm designer
defines the rules and potential outcomes. Without models, algorithm designers would struggle with the
infinite complexity and richness of the world. Imagine, for example, designing a medical system that models
patients at the level of individual atoms. This model offers a detailed representation of each patient in the
most physical or literal sense. But this model is impractical because we do not know how particular
configurations and collections of atoms contribute to a person’s overall health. Compared to this atomic-scale
model, our former model consisting of age, sex, weight, and other physical characteristics is more practical for
designing algorithms, but necessarily involves erasing our individual humanity to draw certain conclusions.

In order to design algorithms, we need to be able to focus on relevant information rather than detailed
representations of the real world. Further, computer science requires a philosophical mind to aid in problem
solving. According to Brian Cantwell Smith, philosopher and cognitive and computer scientist, “Though this is
not the place for metaphysics, it would not be too much to say that every act of conceptualization, analysis, or
categorization, does a certain amount of violence to its subject matter, in order to get at the underlying
1
regularities that group things together.” Without performing this “violence,” there would be too many details
to wade through to create a useful algorithm.

The relationship between algorithms, the software they empower, and the social outcomes they produce is
currently the center of contested social and political debate. For example, all media platforms (e.g., Netflix,
Hulu, and others) use some level of targeted advertising based on user preferences in order to recommend

1 B. C. Smith, “The limits of correctness.” ACM SIGCAS Comput. Soc., vol. 14, 15, no. 1, 2, 3, 4, pp. 18-26, Jan. 1985. https://doi.org/
10.1145/379486.379512.
102 3 • Data Structures and Algorithms

specific movies or shows to their users. Users may not want their information to be used in this way, but there
must be some degree of compromise to make these platforms attractive and useful to people.

On the one hand, the technical definition of an algorithm is that it represents complex processes as a
sequence of precise instructions operating on data. This definition does not overtly suggest how algorithms
encode social outcomes. On the other hand, computer programs are human-designed and socially
engineered. Algorithm designers simplify complex real-world problems by removing details so that they can be
modeled as computational problems. Because software encodes and automates human ideas with computers,
software engineers wield immense power through their algorithms.

To further complicate the matter, software engineering is often a restrictive and formal discipline. Problem
modeling is constrained by the model of computation, or the rules of the underlying computer that is
ultimately responsible for executing the algorithm. Historically, computer science grew from its foundations in
mathematics and formal logics, so algorithms were specialized to solve specific problems with a modest model
of the underlying phenomena. This approach to algorithm design solves certain types of problems so long as
they can be reasonably reduced to models that operate on a modest number of variables—however many
variables the algorithm designer can keep in mind. In the case of the medical system, the algorithm designer
identified certain characteristics as particularly useful for computing a result.

But there are many other problems that defy this approach, particularly tasks that involve subtle and often
unconscious use of human sensory and cognitive faculties. An example of this is facial recognition. If asked to
describe how we recognize a particular person’s face, an algorithm designer would be challenged to identify
specific variables or combinations of variables that correspond to only a single person. The formal logic
required to define an algorithm is strict and absolute, whereas our understanding human faces is defined by
many subtle factors that are difficult for anyone to express using formal logic.

INDUSTRY SPOTLIGHT

Machine Learning Algorithms


A machine learning algorithm addresses these kinds of problems by using an alternative model of
computation, one that focuses on generalized algorithms designed to solve problems with a massive model
of the underlying phenomena. Instead of attempting to identify a few key variables for facial recognition,
for instance, machine learning algorithms can take as input a digital image represented as a rectangular
grid of colored pixels. While each pixel in the image offers very little information about the person in mind,
the facial features unique to each human arise from the arrangements and patterns of pixels that result
from seeing many images of the same person.

Think about the way your Apple iPhone or Google Pixel phone may look at you when you try to access it
and have facial recognition enabled. The algorithm is not going to try to match your face to a saved picture
of you because it would not work all the time if you do not look exactly like you did in the picture. Rather, it
uses machine learning to extract patterns out of a person's face and match them, making it possible to
recognize people all the time even if they are wearing glasses but was not wearing them when they set up
facial recognition on their phone. This method does seem to mimic the way humans recognize people, even
if they have not seen them for decades.

Machine learning algorithms offer a more robust approach to modeling these kinds of problems that are
not easily expressed in formal logic. But in this chapter, we focus on the earlier, classical perspective on
algorithmic problem-solving with the end goal of designing specialized algorithms to solve problems with
modest models of the underlying phenomena.

Access for free at openstax.org


3.2 • Algorithm Design and Discovery 103

Search Algorithms
In computer science, searching is the problem of retrieving a target element from a collection that contains
many elements. There are many ways to understand search algorithms; depending on the exact context of the
problem and the input data, the expected output might differ. For example, suppose we want to find the target
term in a dictionary that contains thousands or millions of terms and their associated definitions. If we
represent this dictionary as a list, the search algorithm would return the index of the term in the dictionary. If
we represent this dictionary as a set, the search algorithm would return whether the target is in the dictionary.
If we represent this dictionary as a map, the search algorithm would return the definition associated with the
term. The dictionary data structure has implications on the output of the search algorithm. Algorithmic
problem-solving tends to be iterative because we might sometime later realize that our data structures need
to change. Changing the data structures, in turn, often also requires changing the algorithm design.

Despite these differences in output, the underlying canonical searching algorithm can still follow the same
general procedure. The two most well-known canonical searching algorithms are known as sequential search
and binary search, which are conducted on linear data structures, such as array lists.

• Sequential search (Figure 3.9). Open the dictionary to the first term. If that term happens to be the target,
then great—we have found the target. If not, then repeat the process by reading the next term in the
dictionary until we have checked all the terms in the dictionary.
• Binary search (Figure 3.10). Open the dictionary to a term in the middle of the dictionary. If that term
happens to be the target, then great—we have found the target. If not, then determine whether the target
comes before or after the term we just checked. If it comes before, then repeat the process except on the
first half of the dictionary. Otherwise, repeat the process on the second half of the dictionary. Each time,
we can ignore half of the remaining terms based on the place where we would expect to find the target in
the dictionary.

Figure 3.9 A sequential search can find the number 47 in an array by checking each number in order. (attribution: Copyright Rice
University, OpenStax, under CC BY 4.0 license)
104 3 • Data Structures and Algorithms

Figure 3.10 A binary search can find the number 47 in an array by determining whether the desired number comes before or after a
chosen number. It eliminates half of existing data points and then searches in the remaining half, repeating the pattern, until the
number is found. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Algorithm Design Patterns


This case study of canonical search algorithms demonstrates some ideas about algorithmic problem-solving,
such as how algorithm design involves iterative improvement (from sequential search to binary search). But
this case study does not demonstrate how algorithms are designed in practice. Algorithm designers are
occasionally inspired by real-world analogies and metaphors, such as relying on sorted order to divide a
dictionary into two equal halves. More often, they depend on knowledge of an existing algorithm design
pattern, or a solution to a well-known computing problem, such as sorting and searching. Rather than
develop wholly new ideas each time they face a new problem, algorithm designers instead apply one or more
algorithm design patterns to solve new problems. By focusing on algorithm design patterns, programmers
can solve a wide variety of problems without having to invent a new algorithm every time.

For example, suppose we want to design an autocomplete feature, which helps users as they type text into a
program by offering word completions for a given prefix query. Algorithm designers begin by modeling the
problem in terms of more familiar data types.

• The input is a prefix query, such as a string of letters that might represent the start of a word (e.g., “Sea”).
• The output is a list of matching terms (completion suggestions) for the prefix query.

In addition to the input and output data, we assume that there is a list of potential terms that the algorithm
will use to select the matching terms.

There are a few different ways we could go about solving this problem. One approach is to apply the
sequential search design pattern to the list of possible words (Figure 3.11). For each term in the list, we add it
to the result if it matches the prefix query. Another approach is to first sort the list of potential terms and then
apply two binary searches: the first binary search to find the first matching term and the second binary search
to find the last matching term. The output list is all the terms between the first match and the last match.

Access for free at openstax.org


3.2 • Algorithm Design and Discovery 105

Figure 3.11 Sequential search needs to check every term to see if it matches the prefix “Sea,” whereas two binary searches can be
used to find the start and end points of the matching terms in the list. (attribution: Copyright Rice University, OpenStax, under CC BY
4.0 license)

TECHNOLOGY IN EVERYDAY LIFE

Online Autocomplete Algorithms


In online mapping, autocomplete might take the prefix Sea and automatically suggest the city Seattle. We
know that search algorithms solve this problem by maintaining a sorted collection of place suggestions. But
many online mapping applications allow users to change maps as the places change in the real world. How
can we design the autocomplete feature to support real-world user changes? To apply the binary search
algorithm, all place names must be stored in a sorted array-based list. So, every change will also need to
maintain the sorted order.

If we instead add all the place names to a binary search tree, what are the steps for the autocomplete
algorithm? How does this choice affect additions, changes, and removals?

Algorithm Analysis
Rather than rely on a direct analogy to our human experiences, these two algorithms for the autocomplete
feature compose one or more algorithm design patterns to solve the problem. How do we know which
approach might be more appropriate to use? One type of analysis is known as algorithm analysis, which
studies the outputs produced by an algorithm as well as how the algorithm produces those outputs. Those
outputs are then evaluated for correctness, which considers whether the outputs produced by an algorithm
match the expected or desired results across the range of possible inputs. An algorithm is considered correct
only if its computed outputs are consistent with all the expected outputs; otherwise, the algorithm is
considered incorrect.

Although this definition might sound simple, verifying an algorithm for correctness is often quite difficult in
practice, because algorithms are designed to generalize and automate complex processes. The most direct
way to verify correctness is to check that the algorithm computes the correct output for every possible input.
This is not only computationally difficult, but even potentially impossible to achieve, since some algorithms can
accept an infinite range of inputs.

Verifying the correctness of an algorithm is difficult not only due to generality, but also due to ambiguity.
Earlier, we saw how canonical searching algorithms may have different outputs according to the input
106 3 • Data Structures and Algorithms

collection type. What happens if the target term contains special characters or misspellings? Should the
algorithm attempt to find the closest match? Some ambiguities can be resolved by being explicit about the
expected output, but there are also cases where ambiguity simply cannot be resolved in a satisfactory way. If
we decide to find the closest match to the target term, how does the algorithm handle cultural differences in
interpretation? If humans do not agree on the expected output, but the algorithm must compute some
output, what output does it then compute? Or, if we do not want our algorithms to compute a certain output,
how does it recognize those situations?

Correctness primarily considers consistency between the algorithm and the model, rather than the algorithm
and the real world. Our autocomplete model from earlier returned all word completions that matched the
incomplete string of letters. But in practice, this output would likely be unusable: a user typing "a" would see a
list of all words starting with the letter "a." Since our model did not specify how to order the results, the user
might get frustrated by the irrelevancy of many of the word completions. Suppose we remedy this issue by
defining a relevancy metric: every time a user completes typing a word, increase that word’s relevancy for
future autocompletion requests. But, as Safiya Noble showed in Algorithms of Oppression, determining
relevance in this universalizing way can have undesirable social impacts. Perhaps due to relevancy metrics
determined by popular vote, at one point, Google search autosuggestions included ideas like:

• Women cannot: drive, be bishops, be trusted, speak in church


• Women should not: have rights, vote, work, box
• Women should: stay at home, be in the kitchen
2
• Women need to: be put in their places, know their place, be controlled, be disciplined

Noble’s critique extends further to consider the intersection of social identities such as race and gender as they
relate to the outputs of algorithms that support our daily life.

GLOBAL ISSUES IN TECHNOLOGY

Searching for Identity


In Algorithms of Oppression, Safiya Noble describes how search engines can amplify sexist, racist, and
misogynistic ideas. While searching for “Black girls,” “Latina girls,” and “Asian girls” circa 2013, Safiya was
startled by how many of the top search results and advertisements that appeared on the first page of
Google Search led to pornographic results when her input query did not at all suggest anything
pornographic. In contrast, searching for “White girls” did not include pornographic results. As algorithms
become more commonplace in our daily lives, they also become a more potent force for determining
certain social futures. Algorithms are immensely powerful in their ability to affect not only how we act, but
also what we see, what we hear, what we believe about the world, and even what we believe about
ourselves.

As the amount of input data increases, computers often need more time or storage to execute algorithms. This
condition is known as complexity, which is based on the degree computational resources that an algorithm
consumes during its execution in relation to the size of the input. More computational time also often means
consuming more energy. Given the exponential explosion in demand for data and computation, designing
efficient algorithms is not only of practical value but also existential value as computing contributes directly to
global warming and resultant climate crises.

2 S.U. Noble, Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press, 2018.

Access for free at openstax.org


3.3 • Formal Properties of Algorithms 107

THINK IT THROUGH

Content Moderation
Online social media platforms facilitate social relationships between users by allowing users to create and
share content with each other. This user-generated content requires moderation, or methods for managing
content shared between the platform users. Some researchers argue that content moderation defines a
social network platform; in other words, content moderation policies determine exactly what content can
be shared on the platform, which in turn defines the value of information. As social media platforms
become increasingly prevalent, the information on these platforms plays an important role in influencing
their users.

One approach for content moderation is to recruit human moderators to review toxic content, or content
that is profane, abusive, or otherwise likely to make someone disengage. An algorithm could be developed
to determine the toxicity of a piece of content, and the most toxic content could be added to a priority
queue for priority moderation.

What are the consequences of this approach? How does the definition of toxicity prioritize (or de-prioritize)
certain content? Who does it benefit? Consider the positionality of the users that interact with the platform:

• marginalized users of the platform, who may be further marginalized by this definition of toxicity.
• content moderators, who are each reviewing several hundred pieces of the most toxic content for
hours every day.
• legal teams, who want to mitigate government regulations and legislation that are not aligned with
corporate interests.
• social media hackers, or users who want to leverage the way certain content is prioritized in order to
deliberately shape public opinion.

3.3 Formal Properties of Algorithms


Learning Objectives
By the end of this section, you will be able to:
• Understand time and space complexity
• Compare and contrast asymptotic analysis with experimental analysis
• Explain the Big O notation for orders of growth

Beyond analyzing an algorithm by examining its outputs, computer scientists are also interested in examining
its efficiency by performing an algorithmic runtime analysis, a study of how much time it takes to run an
algorithm.

If you have access to a runnable program, perhaps the most practical way to perform a runtime analysis is to
time exactly how long it takes to run the program with a stopwatch. This approach, known as experimental
analysis, evaluates an algorithm’s runtime by recording how long it takes to run a program implementation of
it. Experimental analysis is particularly effective for identifying performance bugs or code that consumes
unusually large amounts of computation time or system resources, even though it produces the correct
output. In e-commerce, for example, performance bugs that result in slow website responsiveness can lead to
millions of dollars in lost revenue. In the worst-case scenario, performance bugs can even bring down entire
websites and networks when systems are overloaded and cannot handle incoming requests. As the Internet
becomes more heavily used for information and services, performance bugs can have direct impacts on health
and safety if the computer infrastructure cannot keep up with demand.

While experimental analysis is useful for improving the efficiency of a program, it is hard to use if we do not
108 3 • Data Structures and Algorithms

already have a working program. Programming large systems can be expensive and time-consuming, so many
organizations want to compare multiple algorithm designs and approaches to identify the most suitable
design before implementing the system. Even with sample programs to represent each algorithm design, we
can get different results depending on the processing power, amount of memory available, and other features
of the computer that is running the program.

Designing more efficient algorithms is not just about solving problems more quickly, but about building a
more sustainable future. In this section, we will take a closer look at how to formally describe the efficiency of
an algorithm without directly executing a working program.

CONCEPTS IN PRACTICE

Performance Profiling
Modern computer systems are complicated. Algorithms are just one component in a much larger
ecosystem that involves communication between many other subsystems, other computers in a data
center, and other systems on the Internet. Algorithmic runtime analysis focuses on the properties of the
algorithm rather than all the different ways the algorithm interacts with the rest of the world. But once an
algorithm is implemented as a computer program, these interactions with the computing ecosystem play
an important role in determining program performance.

A profiler is a tool that measures the performance (runtime and memory usage) of a program. Profilers are
commonly used to diagnose real-world performance issues by producing graphs of how computational
resources are used in a program. A common graph is a flame graph (Figure 3.12) that visualizes resource
utilization by each part of a program to help identify the most resource-intensive parts of a program. Saving
even a few percentage points of resources can lead to significantly reduced time, money, and energy
expenditure. As the global demand for computation continues to increase, performance engineers who
know how to leverage profilers to analyze systems and implement resource-saving changes will be key to a
green energy future.

Figure 3.12 A flame graph shows which parts of a program require the most resources. The x-axis shows relative duration and
the width indicates the percentage of total duration spent in a function. (attribution: Copyright Rice University, OpenStax, under
CC BY 4.0 license)

Time and Space Complexity


One way to measure the efficiency of an algorithm is through time complexity, a formal measure of how
much time an algorithm requires during execution as it relates to the size of the problem. In addition to time
complexity, computer scientists are also interested in space complexity, the formal measure of how much
memory an algorithm requires during its execution as it relates to the size of the problem.

Both time and space complexity are formal measures of the efficiency of an algorithm as it relates to the size
of the problem, particularly when we are working with large amounts of complex data. For example, gravity,
the universal phenomenon by which things attract and move toward each other, can be modeled as forces that
act on every pair of objects in the universe. Simulating a subset of the universe that contains only 100

Access for free at openstax.org


3.3 • Formal Properties of Algorithms 109

astronomical bodies will take a lot less time than a much larger universe with billions, trillions, or even more
bodies, all of which gravitate toward each other. The size of the problem plays a large role in determining how
much time an algorithm requires to execute. We will often express the size of the problem as a positive integer
number corresponding to the size of the dataset such as the number of astronomical bodies in our simulation.

The goal of time and space complexity analysis is to produce a simple and easy-to-compare characterization of
the efficiency of an algorithm as it relates to the size of the problem. Consider the following description of an
algorithm that searches for a target word in a list. Start from the very beginning of the list and check if the first
word is the target word. If it is, we have found the word. If it is not, then continue to the next word in the list
and repeat the process.

The first task is to identify a metric for representing the size of the problem. Typically, time complexity analysis
assumes asymptotic analysis, focusing on evaluating the time that an algorithm takes to produce a result as
the size of the input increases. There are two inputs to this algorithm: (1) the list of words, and (2) the target
word. The average length of an English word is about five characters, so the size of the problem is primarily
determined by the number of words in the list rather than the length of any word. (This assumption might not
be right if our dataset was instead a DNA sequence consisting of millions of nucleotides—the time it takes to
compare a pair of long DNA sequences might be more important than the number of DNA sequences being
compared.) Identifying the size of the problem is an important first task because it determines the other
factors we can consider in the following tasks.

The next task is to model the number of steps needed to execute the algorithm while considering its potential
behavior on all possible inputs. A step represents a basic operation in the computer, such as looking up a
single value, adding two values, or comparing two values. How does the runtime change as the size of the
problem increases? We can see that the “repeat” part of our description is affected by the number of words in
the list; more words can potentially lead to more repetitions.

In this case we are choosing a cost model, which is a characterization of runtime in terms of more abstract
operations, such as the number of repetitions. Rather than count single steps, we instead count repetitions.
Each repetition can involve several lookups and comparisons. By choosing each repetition as the cost model,
we declare that the few steps needed to look up and compare elements can be effectively treated as a single
operation to simplify our analysis.

However, this analysis is not quite complete. We might find the target word early in the list even if the list is
very large. Although we defined the size of the problem as the number of words in the list, the size of the
problem does not account for the exact words and word ordering in the list. Computer scientists say that this
algorithm has a best-case situation when the word can be found at the beginning of the list, and a worst-case
situation when the word can only be found at the end of the list (or, perhaps, not even in the list at all). One
way to account for the variation in runtime is via case analysis, which is based on factors other than the size
of the problem.

Finally, we can formalize our description using either precise English or a special mathematical notation called
Big O notation, which is the most common type of asymptotic notation in computer science used to measure
worst-case complexity. In precise English, we might say that the time complexity for this sequential search
algorithm has two cases (Figure 3.13):

• In the best-case situation (when the target word is at the start of the list), sequential search takes just one
repetition to find the word in the list.
• In the worst-case situation (when the target word is either at the end of the list or not in the list at all),
sequential search takes N repetitions where N is the number of words in the list.
110 3 • Data Structures and Algorithms

Figure 3.13 The best case for sequential search in a sorted list is to find the word at the top of the list, whereas the worst case is to
find the word at the bottom of the list (or not in the list at all). (attribution: Copyright Rice University, OpenStax, under CC BY 4.0
license)

While this description captures all the ideas necessary to communicate the time complexity, computer
scientists will typically enhance this description with mathematics to convey a geometric idea of the runtime.
The order of growth is a geometric prediction of an algorithm’s time or space complexity as a function of the
size of the problem (Figure 3.14).

• In the best case, the sequential search has a constant order of growth that does not take more resources
as the size of the problem increases.
• In the worst case, the sequential search has a linear order of growth where the resources required to run
the algorithm increase at about the same rate as the size of the problem increases. This is with respect to
N, the number of words in the list.

Constant and linear are two examples of orders of growth. An algorithm with a constant order of growth takes
the same amount of time to execute even as the size of the problem grows larger and larger—no matter how
large the dictionary is, it is possible to find the target word at the very beginning. In contrast, an algorithm
with a linear time complexity will take more time to execute as the size of the problem grows larger, and we
can predict that an increase in the size of the problem corresponds to roughly the same increase in the
runtime.

This prediction is a useful outcome of time complexity analysis. It allows us to estimate the runtime of the
sequential search algorithm on a problem of any size, before writing the program or obtaining a dictionary of
words that large. Moreover, it helps us decide if we want to use this algorithm or explore other algorithm
designs and approaches. We might compare this sequential search algorithm to the binary search algorithm
and adjust our algorithm design accordingly.

Access for free at openstax.org


3.3 • Formal Properties of Algorithms 111

Figure 3.14 The order of growth of an algorithm is useful to estimate its runtime efficiency as the input size increases (e.g., constant,
logarithmic, and other orders of growth) to help determine which algorithmic approach to take. (attribution: Copyright Rice
University, OpenStax, under CC BY 4.0 license)

Big O Notation for Orders of Growth


In the 1970s, computer scientists applied asymptotic notation, a mathematical notation that formally defines
the order of growth. We can use Big O notation to describe the time complexity of the sequential search
algorithm. In general, we say a function f(N) is in the class of O(g(N)), denoted by f(N) = O(g(N)) or f(N) in
O(g(N)), if when N tends to infinity, the ratio f(N)/g(N) is upper bounded by some constant. The function g(N) is
usually some simple function that defines the order of growth such as g(N) = 1 (constant function), g(N) = N
(linear function), g(N) = log N (logarithmic function), or other functions as follows:

• In the best case, the order of growth of sequential search is in O(1).


• In the worst case, the order of growth of sequential search is in O(N) with respect to N, the number of
words in the list.

The constant order of growth is described in Big O notation as “O(1)” while the linear order of growth is
described in Big O notation as “O(N) with respect to N, the size of the problem.” Big O notation formalizes the
concept of a prediction. Given the size of the problem, N, calculate how long it takes to run the algorithm on a
problem of that size. For large lists, in order to double the worst-case runtime of sequential search, we would
need to double the size of the list.

O(1) and O(N) are not the only orders of growth.

• O(1), or constant.
• O(log N), or logarithmic.
• O(N), or linear.
• O(N log N), or linearithmic.
• O(N2), or quadratic.
• O(N3), or cubic.
• O(2N), O(3N), . . . , or exponential.
• O(N!), or factorial.

The O(log N), or logarithmic, order of growth appears quite often in algorithm analysis. The logarithm of a
112 3 • Data Structures and Algorithms

large number tells how many times it needs to be divided by a small number until it reaches 1. The binary
logarithm, or log2, tells how many times a large number needs to be divided by 2 until it reaches 1. In the
worst case, the time complexity of sequential search is in O(N) with respect to N, the number of words in the
list, since each repetition of sequential search rules out one remaining element. How about binary search? In
the worst case, the time complexity of binary search is in O(log N) with respect to N, the number of words in
the list, since each repetition of binary search rules out half the remaining elements.

Another way to understand orders of growth is to consider how a change in the size of the problem results in a
change to the resource usage. When we double the size of the input problem, algorithms in each order of
growth respond differently (Figure 3.15).

• O(1) algorithms will not require any more resources.


• O(log N) algorithms will require 1 additional resource unit.
• O(N) algorithms will require 2 times the number of resources.
• O(N log N) algorithms will require a little more than 2 times the number of resources.
• O(N2) algorithms will require 4 times the number of resources.
• O(N3) algorithms will require 8 times the number of resources.
• O(2N), O(3N), . . . algorithms will require the squared or cubed number of resources.
• O(N!) algorithms will require even more.

This growth compounds, so quadrupling the size of the problem for an O(N2) algorithm will require 16 times
the number of resources. Algorithm design and discovery is often motivated by these massive differences
between orders of growth. Note that this explanation of how each order of growth responds differently
oversimplifies the problem. Rigorously speaking, a function f(N) expressed in the big-O notation as being is in
the class of O(g(N)) can be much more complex than the simple function g(N). For example, f(N) = 4log N + 100
log(log N) is in O(log N), but when N doubles, f(2N) is definitely not just one unit more than the original
function f(N). A similar argument applies for all other functions other than the constant O(1) function.

Figure 3.15 This chart shows the time it would take for an algorithm with each of the given orders of growth to finish running on a
problem of the given size, N. When an algorithm takes longer than 1025 years to compute, that means it takes longer than the
current age of the universe. (data source: Geeks for Geeks, “Big O Notation Tutorial—A Guide to Big O Analysis,” last updated March
29, 2024; attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

As the size of the problem increases, algorithmic complexity becomes a larger and larger factor. For problems
dealing with just 1,000 elements, the time it would take to run an exponential-time algorithm on that problem
exceeds the current age of the universe. In practice, across applications working with large amounts of data,
O(N log N) is often considered the limit for real-world algorithms. Even then, O(N log N) algorithms cannot be
run too frequently. For algorithms that need to run frequently on large amounts of data, algorithm designers
target O(N), O(log N), or O(1).

Access for free at openstax.org


3.4 • Algorithmic Paradigms 113

TECHNOLOGY IN EVERYDAY LIFE

Arranging Invisible Icons in Quadratic Time


Have you ever been annoyed by computer slowness? For some users, opening the start menu can take 20
seconds because of an O(N2) algorithm, where N is the number of desktop files. The Microsoft Windows
computer operating system allows users to organize files directly on top of their desktop wallpaper. A
quadratic-time algorithm arranges these desktop icons in a grid layout so that they fill column-by-column
starting from the left side of the screen. The algorithm is executed whenever the user opens the start menu
or launches the file explorer.

Most users only keep a couple dozen desktop icons, so the O(N2) algorithm takes microseconds—practically
unnoticeable. But for users with hundreds of desktop icons, the impact of each additional icon adds up.
With 1,000 desktop files, launching the start menu takes 20 seconds. With 10,000 desktop icons, the
runtime grows to 30 minutes!

To avoid the clutter of thousands of desktop icons, users can hide desktop icons. But this does not prevent
the quadratic-time algorithm from running. Users with too many desktop icons beware: your computer
3
slowness may be due to arranging invisible icons in quadratic time.

3.4 Algorithmic Paradigms


Learning Objectives
By the end of this section, you will be able to:
• Apply the divide and conquer technique
• Explain the brute-force method
• Interpret and apply the greedy method
• Understand how to apply reductions to solve problems

Algorithm design patterns are solutions to well-known computing problems. In 3.5 Sample Algorithms by
Problem, we will survey algorithm design patterns by problem. As it turns out, many of these algorithm design
patterns share similarities in their approaches to solving problems. Here, we will introduce the algorithmic
paradigm, which is the common concepts and ideas behind algorithm design patterns.

Divide and Conquer Algorithms


A divide and conquer algorithm is an algorithmic paradigm that breaks down a problem into smaller
subproblems (divide), recursively solves each subproblem (conquer), and then combines the result of each
subproblem to form the overall solution. The algorithm idea of recursion is fundamental to divide and conquer
algorithms because it solves complex problems by dividing input data into smaller instances of the same
problem known as subproblems. Such recursion calls terminate when the inputs become so small or so simple
that other non-recursive procedures can provide the answers.

A subproblem is a smaller instance of a problem that can be solved independently, and each subproblem can
be solved independently of other subproblems by reapplying the same recursive algorithm. To design
recursive subproblems, algorithm designers often focus on identifying structural self-similarity in the input
data. This process repeats until the input data is small enough to solve directly. Once all the subproblems have
been solved, the recursive algorithm reassembles each of these independent solutions to compute the result
for the original problem.

3 B. Dawson, “Arranging invisible icons in quadratic time,” 2021. https://randomascii.wordpress.com/2021/02/16/arranging-


invisible-icons-in-quadratic-time/
114 3 • Data Structures and Algorithms

Earlier, we introduced binary search to find a target within a sorted list as an analogy for finding a term in a
dictionary sorted alphabetically. Instead of starting from the beginning of the dictionary and checking each
term, as in a sequential search, we could instead start from the middle and look left or right based on where
we would expect to find the term in the dictionary. But binary search can also be understood as an example of
a divide and conquer algorithm (Figure 3.16).

1. The problem of finding a target within the entire sorted list is broken down (divided) into the
subproblem of finding a target within half of the list after comparing the middle element to the target.
Half of the list can be ruled out based on this comparison, leaving binary search to find the target
within the remaining half.
2. Binary search is repeated on the remaining half of the sorted list (conquer). This process continues
recursively until the target is found in the sorted list (or reported as not in the list at all).
3. To solve the original problem of finding a target within the entire sorted list, the result of the
subproblem must inform the overall solution. The original call to binary search reports the same result
as its subproblem.

Figure 3.16 Binary search is a divide and conquer algorithm that repeatedly makes one recursive call on half of remaining elements.
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Binary search makes a single recursive call on the remaining half of the list as part of the conquer step, but
other divide and conquer algorithms make multiple recursive calls to solve their problems. We will also see
algorithms that do a lot of work in the final step of combining results more than just reporting the same result
as a subproblem.

Given a list of elements in an unknown order, a sorting algorithm should return a new list containing the same
elements rearranged into a logical order, such as least to greatest. One canonical divide and conquer
algorithm for comparison sorting is called merge sort. The problem of comparison sorting is grounded in the
comparison operation. The comparison operation is like a traditional weighing scale that tells whether one
element is heavier, lighter, or the same weight as another element, but provides no information about the
exact weight or value of the element. Though this might seem like a serious restriction, comparison sorting is
actually a very rich problem in computer science—it is perhaps the most deeply studied problem in computer
science. Choosing comparison as the fundamental operation is also practical for complex data, where it might
be hard (or even impossible) to assign an exact numeric ranking (Figure 3.17).

1. The problem of sorting the list is broken down (divided) into two subproblems: the subproblem of
sorting the left half and the subproblem of sorting the right half.
2. Merge sort is repeated to sort each half (conquer). This process continues recursively until the sublists
are one element long. To sort a one-element list, the algorithm does not need to do anything, since the
elements in the list are already arranged in a logical order.
3. To solve the original problem of sorting the entire list, combine adjacent sorted sublists by merging
them while maintaining sorted order. Merging each pair of adjacent sorted sublists repeats to form
larger and larger sorted sublists until the entire list is fully sorted.

Access for free at openstax.org


3.4 • Algorithmic Paradigms 115

Figure 3.17 Merge sort is a divide and conquer algorithm that repeatedly makes two recursive calls on both halves of the sublist.
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Brute-Force Algorithms
Solving a combinatorial problem involves identifying the best candidate solution out of a space of many
potential solutions. Each solution to a combinatorial problem is represented as a complex data type. Many
applications that involve simulating and comparing different options can be posed as combinatorial problems.

• Route planning in online mapping asks, “Out of all the possible routes from point A to point B, which route
is the shortest?” (Shortest paths problem)
• Municipal broadband planning asks, “Out of all the possible ways to connect every real-world address to
the Internet, which network of connections is the cheapest to build?” (Minimum spanning trees problem)
• The interval scheduling problem is a combinatorial problem involving a list of scheduled tasks with the
goal of finding the largest non-overlapping set of tasks that can be completed.

Rather than searching for a single element, combinatorial problems focus on finding candidate solutions that
might be represented as a list of navigation directions, network connections, or task schedules. And unlike
sorting, where the output should be a sorted list, combinatorial problems often attempt to quantify or
compare the relative quality of solutions in order to determine the best candidate solution out of a space of
many potential solutions. Even if our route plan is not the “best” or shortest route, it can still be a valid solution
to the problem.

A brute-force algorithm solves combinatorial problems by systematically enumerating all potential solutions
in order to identify the best candidate solution. For example, a brute-force algorithm for generating valid
credit card numbers might start by considering a credit card number consisting of all zeros, then all zeros
except for a one in the last digit place, and so forth, to gradually explore all the possible values for each digit
place.

Brute-force algorithms exist for every combinatorial problem, but they are not typically used because of
runtime issues. To systematically enumerate all potential solutions, a brute-force algorithm must generate
every possible combination of the input data. For example, if a credit card number has sixteen digits and each
digit can have any value between zero and nine, then there are 1016 potential credit card numbers to
enumerate. The combinatorial explosion is the exponential number of solutions to a combinatorial problem
that makes brute-force algorithms unusable in practice. Despite continual improvements to how quickly
computers can execute programs, exponential time brute-force algorithms are impractical except for very
small problem input data.
116 3 • Data Structures and Algorithms

INDUSTRY SPOTLIGHT

Protein Folding
Proteins are one of the fundamental building blocks of biological life. The 3-D shape of a protein defines
what it does and how it works. Given the string of a protein’s amino acids, a protein-folding problem asks us
to compute the 3-D shape of the resulting protein.

Protein folding has been studied since the first images of their structures were created in 1960. In 1972,
Christian Anfinsen won the Nobel Prize in Chemistry for his research into the “protein-folding problem,”
which involved algorithms to predict the structure of proteins. Because of the huge number of possible
formations of proteins, computational studies and algorithms are better able to predict the structures.
Given that a brute-force algorithm cannot solve this problem in a reasonable amount of time,
computational biologists have developed algorithms that generate high-quality approximations or potential
solutions that typically are not quite correct, but run in a more reasonable amount of time.
4
Modern protein-folding algorithms, such as Google’s DeepMind AlphaFold machine-learning algorithm,
use machine learning to identify protein-folding patterns from millions of input amino acid sequences and
corresponding output 3-D conformations. Rather than utilizing a simple rule for selecting the next element
to include in the solution, these machine learning algorithms learn highly complicated rulesets from subtle
patterns present in the data.

Improving our understanding of protein folding can lead to massive improvements only in medical health
contexts such as drug and vaccine development, but also enable us to design biotechnologies such as
enzymes for composting plastic waste, and even limit the impact of global warming by sequestering
5
greenhouse gases from the atmosphere. In 2024, the Nobel Prize Committee recognized the impact of
this work by granting the Chemistry prize to Demis Hassabis and John M. Jumper for their work on
6
DeepMind and Alphafold , as well as David Baker for using a similar tool, Rosetta, to create entirely new
proteins.

Greedy Algorithms
A greedy algorithm solves combinatorial problems by repeatedly applying a simple rule to select the next
element to include in the solution. Unlike brute-force algorithms that solve combinatorial problems by
generating all potential solutions, greedy algorithms instead focus on generating just one solution. These
algorithms are greedy because they select the next element to include based on the immediate benefit.

For example, a greedy interval scheduling algorithm might choose to work on the task that takes the least
amount of time to complete; in other words, the cheapest way to mark one task as completed (Figure 3.18). If
the tasks are scheduled in advance and we can only work on one task at a time, choosing the task that takes
the least amount of time to complete might prevent us from completing multiple other (longer) tasks that just
so happen to overlap in time. This greedy algorithm does not compute the right output—it finds a solution but
not the optimal solution.

4 W. D. Haven, “DeepMind’s protein-folding AI has solved a 50-year-old grand challenge of biology,” 2020.
https://www.technologyreview.com/2020/11/30/1012712/deepmind-protein-folding-ai-solved-biology-science-drugs-disease/
5 Google DeepMind, “AlphaFold: A solution to a 50-year-old grand challenge in biology,” 2020." https://deepmind.com/blog/article/
alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology
6 Google DeepMind, “AlphaFold: Demis Hassabis & John Jumper awarded Nobel Prize in Chemistry,” 2024."
https://deepmind.google/discover/blog/demis-hassabis-john-jumper-awarded-nobel-prize-in-chemistry/

Access for free at openstax.org


3.4 • Algorithmic Paradigms 117

Figure 3.18 Greedy interval scheduling will not work if the simple rule repeatedly selects the shortest interval. (attribution: Copyright
Rice University, OpenStax, under CC BY 4.0 license)

The majority of greedy algorithms do not always compute the best solution. But there are also certain
scenarios in which cleverly crafted greedy algorithms guarantee finding optimal solutions. The problems they
solve are formulated to ensure that the greedy algorithm never makes a mistake when repeatedly applying
the simple rule to select the next element.

Consider the municipal broadband planning problem or, more formally, the minimum spanning tree problem,
of finding a lowest-cost way to connect all the vertices in a connected graph to each other. If we want to
minimize the sum of the selected edge weights, one idea is to repeatedly select the next edge (connections
between vertices) with the lowest weight so long as it extends the reach of the network. Or, in the context of
the municipal broadband planning problem, we want to ensure that the next-cheapest connection that we
choose to build reaches someone who needs access to the Internet (Figure 3.19).

Figure 3.19 Municipal broadband planning can be represented as a minimum spanning trees graph problem where the weight of
each edge represents the cost of building a connection between two vertices or places. (attribution: Copyright Rice University,
OpenStax, under CC BY 4.0 license)

GLOBAL ISSUES IN TECHNOLOGY

Municipal Broadband as a Public Utility


The municipal broadband planning problem is just one component of the larger public policy issue of
Internet access. As the Internet and the use of digital platforms becomes the standard mode for
communication, many people are viewing municipal broadband as a fundamental public utility and civil
right. “Municipal broadband can solve access and affordability problems in areas where private ISPs
[Internet service providers] have not upgraded networks to modern speeds, fail to provide service to all
7
residents, and/or charge outrageous rates.”

While a minimum spanning tree algorithm can solve the municipal broadband planning problem, the
challenges of deploying municipal broadband for everyone is more political rather than algorithmic. But
other algorithms can also directly contribute to the way in which society understands the problem. For
example, we might use algorithms to better visualize and understand who currently has access to
affordable and reliable high-speed Internet. We can design algorithms to ensure equitable distribution,

7 J. Broken, “Victory for municipal broadband as Wash. state lawmakers end restrictions,” 2021. Ars Technica,
https://arstechnica.com/tech-policy/2021/04/victory-for-municipal-broadband-as-wash-state-lawmakers-end-restrictions
118 3 • Data Structures and Algorithms

deployment, and integration of new technologies so that marginalized communities can realize the positive
economic benefits first. Or we can reconfigure the minimum spanning trees problem model to take
specifically account for expanding network access in an equitable fashion.

Computer scientists have designed algorithms that repeatedly apply this simple rule to find an optimal
minimum spanning tree:

• Kruskal’s algorithm, a greedy algorithm which sorts the list of edges in the graph by weight.
• Prim’s algorithm, a greedy algorithm that maintains a priority queue of vertices in the graph ordered by
connecting edge weight.

For most problems, greedy algorithms will not produce the best solution. Instead, algorithm designers
typically turn to another algorithmic paradigm called dynamic programming. Still, greedy algorithms provide a
useful baseline starting point for understanding problems and designing baseline algorithms for generating
potential solutions.

Reduction Algorithms
Algorithm designers spend much of their time modeling problems by selecting and adapting relevant data
structures and algorithms to represent the problem and a solution in a computer program. This process of
modeling often involves modifying an algorithm design pattern so that it can be applied to the problem. But
there is also a different approach to algorithm design that solves problems by changing the input data and
output data to fit a preexisting problem. Rather than solve the problem directly, a reduction algorithm solves
problems by transforming them into other problems (Figure 3.20).

1. Preprocess: Transform the input data so that it is acceptable to an algorithm meant for the other
problem.
2. Apply the algorithm meant for the other problem on the preprocessed data.
3. Postprocess: Transform the output of the algorithm meant for the other problem so that it matches the
expected output for the original problem.

Figure 3.20 A reduction algorithm preprocesses the input data, passes it to another algorithm, and then postprocesses the
algorithm’s output to solve the original problem. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Consider a slightly different version of the municipal broadband planning problem, where instead of only
considering connections (edges), we expand the problem to consider the possibility of installing broadband
nodes directly to each address without relying on potentially expensive neighboring connections. That is, all
vertices installed with broadband nodes are inter-connected with each other through another broadband
network. If we were to run an algorithm for solving the minimum spanning tree problem on this graph
directly, then our result would never consider installing broadband nodes directly, since minimum spanning
tree algorithms do not consider vertex weights (Figure 3.21).

Access for free at openstax.org


3.5 • Sample Algorithms by Problem 119

Figure 3.21 The problem of finding a minimum spanning tree in a graph with vertex weights can be reduced to the problem of
finding a minimum spanning tree in a graph without vertex weights. (attribution: Copyright Rice University, OpenStax, under CC BY
4.0 license)

We say that this more complicated municipal broadband planning problem reduces to the minimum spanning
tree problem because we can design a reduction algorithm consisting of preprocessing and postprocessing
procedures.

• Preprocess: Introduce an extra vertex that does not represent a real location but connects to every
address (vertex) in the city. The edge weight of each connection is the cost of installing the broadband
node directly at that location.
• Postprocess: After computing a minimum spanning tree for the preprocessed graph, remove the extra
vertex that we added during the processing step. Any edges connected to the extra vertex represent a
direct broadband node installation, while other edges between real locations are just the same network
connections that we saw earlier.

Reduction algorithms enable algorithm designers to solve problems without having to modify existing
algorithms or algorithm design patterns. Reduction algorithms allow algorithm designers to rely on optimized
canonical algorithms rather than designing a solution by composing algorithm design patterns, which can lead
to performance or correctness bugs. Reduction algorithms also enable computer scientists to make claims
about the relative difficulty of a problem. If we know that a problem A reduces to another problem B, B is as
difficult to solve as A.

3.5 Sample Algorithms by Problem


Learning Objectives
By the end of this section, you will be able to:
• Discover algorithms that solve data structure problems
• Understand graph problems and related algorithms

Earlier, we introduced several computing problems, like searching for a target value in a list or implementing
an abstract data type (lists, sets, maps, priority queues, graphs). Although every computational problem is
unique, these types of problems often share significant similarities with other problems. Computer scientists
have identified many canonical problems that represent these common problem templates. Although each of
these canonical problems may have many algorithmic solutions, computer scientists have also identified
canonical algorithms for solving these problems. In this section, we will introduce canonical problems and
survey canonical algorithms for each problem.
120 3 • Data Structures and Algorithms

Data Structure Problems


Data structure problems are not only useful for implementing data structures, but also as fundamental
algorithm design patterns for organizing data to enable efficient solutions to almost every other computing
problem.

Searching
Searching is the problem of retrieving a target element from a collection of elements. Searching in a linear
data structure, such as an array list, can be done using either sequential search or binary search.

Sequential Search Algorithm

A sequential search algorithm is a searching algorithm that sequentially checks the collection element-by-
element for the target. The runtime of sequential search is in O(N) with respect to N, the number of elements
in the list (Figure 3.22).

Figure 3.22 Sequential search is a search algorithm that checks the collection element by element to find a target element.
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Binary Search Algorithm

A binary search algorithm recursively narrows down the possible locations for the target in the sorted list.
Initially, the algorithm has no information—the target can be anywhere in the entire range of the sorted list.
By comparing the target to the middle element of the range, we can rule-out half of the elements in the range.
This process can be repeated until we have found the expected location of the target in the sorted list. The
runtime of binary search is in O(log N) with respect to N, the number of elements in the sorted list, so long as
indexing is a constant-time operation (see Figure 3.11).

Although the sequential search algorithm works with any collection type such as lists, sets, dictionaries, and
priority queues, the binary search algorithm relies on a sorted list with access to elements by index.
Consequently, binary search is efficient on array lists that provide constant-time access to the element at any
index and inefficient on linked lists, which do not enable constant-time access to elements by index. Binary
search relies on the structure of the sorted list to repeatedly rule-out half of the remaining elements.

Binary search trees represent the concept of binary search in a tree data structure by arranging elements in
the tree in sorted order from left to right. Ideally, the root node represents the middle element in the sorted
tree and each child roughly divides each subtree in half. But, in the worst case, a binary search tree can look
exactly like a linked list where each node contains either zero children or one child. Although such a tree still
arranges its elements in sorted order from left to right, comparing the target to each node only reduces the
size of the problem by one element rather than ruling out half of the remaining elements—the worst-case
binary search tree degrades to sequential search. Balanced binary search trees, such as AVL trees, addressed
this worst-case scenario by maintaining the AVL property of balance between left and right subtrees.

Sorting
Sorting is the problem of rearranging elements into a logical order, typically from least-valued (smallest) to
greatest-valued (largest). Sorting is a fundamental problem not only because of the tasks that it directly solves,
but also because it is a foundation for many other algorithms such as the binary search algorithm or Kruskal’s
algorithm for the minimum spanning tree problem.

The most common type of sorting algorithm solves the problem of comparison sorting, or sorting a list of

Access for free at openstax.org


3.5 • Sample Algorithms by Problem 121

elements where elements are not assigned numeric values but rather defined in relation to other elements.
For simplicity, the data are typically assumed to be stored in an array list for indexed access, and the sorting
algorithm can either return a new sorted list or rearrange the elements in the array list so that they appear in
sorted order.

Merge Sort Algorithm

A merge sort algorithm recursively divides the data into sublists until sublists are one element long—which we
know are sorted—and then merges adjacent sorted sublists to eventually return the sorted list. The merge
operation combines two sorted sublists to produce a new, larger sorted sublist containing all the elements in
sorted order. The actual rearranging of elements occurs by repeatedly applying the merge operation on pairs
of adjacent sorted sublists, starting with the smallest single-element sublists, to larger two-element sublists,
and eventually reaching the two halves of the entire list of elements. The runtime of merge sort is in O(N log N)
with respect to N, the number of elements (see Figure 3.17).

Quicksort Algorithm

A quicksort algorithm recursively sorts data by applying the binary search tree algorithm design pattern to
partition data around pivot elements. Whereas the merge sort algorithm rearranges elements by repeatedly
merging sorted sublists after each recursive subproblem, the quicksort algorithm rearranges elements by
partitioning data before each recursive subproblem. The partition operation takes a pivot and rearranges the
elements into three sections, from left to right: the sublist of all elements less than the pivot, the pivot
element, and the sublist of all elements greater than (or equal to) the pivot. Each of the two sublists resulting
from the partition operation is a recursive quicksort subproblem; when both of the sublists are sorted, the
entire list will be sorted. The runtime of quicksort depends on the choice of each pivot element during the
execution of the recursive algorithm, but in practice, for most of the inputs, the runtime is in O(N log N) with
respect to N, the number of elements (Figure 3.23).

Figure 3.23 Quicksort is a divide and conquer sorting algorithm that sorts elements by recursively partitioning elements around a
pivot. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Heapsort Algorithm

A heapsort algorithm adds all elements to a binary heap priority queue data structure using the comparison
operation to determine priority value, and returns the sorted list by repeatedly removing from the priority
queue element by element. The runtime of heapsort is in O(N log N) with respect to N, the number of
elements. The logarithmic time factor is due to the time it takes to add or remove each element from the
binary heap (Figure 3.24).
122 3 • Data Structures and Algorithms

Figure 3.24 Heapsort uses the binary heap data structure to sort elements by adding and then removing all elements from the heap.
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Many comparison sorting algorithms share the same O(N log N) runtime bound with respect to N, the number
of elements. Computer scientists have shown with a combinatorial proof that, in the worst case, a comparison
sorting algorithm cannot do better than O(N log N) comparisons. It is impossible to design a worst-case O(N)
runtime comparison sorting algorithm. In fact, a commonly used version of heapsort (which is also
asymptotically faster) is to first build a binary heap (i.e., first arrange the input numbers in an array, then
repeatedly call a function to turn the original array into a binary heap; one can show that the running time of
this part is linear in N, which is why this is faster than constructing the heap by adding numbers one by one),
then remove elements one by one from the end of the heap.

But not all sorting problems are comparison sorting problems. In fact, a commonly used version of heapsort
(which is also asymptotically faster) is to first build a binary heap (i.e., first arrange the input numbers in an
array, then repeatedly call a function to turn the original array into a binary heap. One can show that the
running time of this part is linear in N, which is why this is faster than constructing the heap by adding
numbers one by one), then remove elements one by one from the end of the heap as explained previously.

Another type of sorting problem that is not restricted to pairwise comparisons is known as count sorting, or
sorting a list of elements by organizing elements into categories and rearranging the categories into a logical
order. For example, the problem of sorting a deck of cards can be seen as a count sorting problem if we put
the cards into numeric stacks and then rearrange the stacks into a logical order. By changing the assumptions
of the problem, count sorting algorithms can run in O(N) time by first assigning each element to its respective
category and then unpacking each category so that elements appear in sorted order.

Hashing
Hashing is the problem of designing efficient algorithms which map each object to an integer so that most (if
not all) objects will be assigned distinct integers. Although hashing algorithms are often specific to each data
type, there exist some general approaches for designing hashing algorithms. The hash value of a simple data
type such as an integer can just be the value of the integer itself. The hash value of a string of text can be
some mathematical combination of the numeric value of each character in the string. Likewise, the hash value
of a collection such as a list can be some combination of the underlying numeric data within each element in
the collection.

Hashing has a variety of applications spanning computer systems, database systems, computer security, and
searching algorithms. For example, hashing algorithms are often used designing secure systems for
protecting stored passwords even after a security breach occurs. In the context of data structure problems,
hashing offers a different approach than binary search. Instead of relying on pairwise comparisons to narrow
down the expected location of an element in a sorted list in logarithmic time, hashing search algorithms can
instead directly index an element by hash value in constant time. If binary search trees implement sets and
maps by applying the concept of binary search, a hash table implements sets and maps by applying the
concept of hashing (Figure 3.25). Rather than organize elements in sorted order from left to right as in a binary
search tree, hash tables store and retrieve elements in an array indexed by hash value.

Access for free at openstax.org


3.5 • Sample Algorithms by Problem 123

Figure 3.25 Hash tables data structures apply hashing to implement abstract data types such as sets and maps, but must handle
collisions between elements that share the same hash value. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0
license)

Hashing search algorithms are often preferred over binary search algorithms for their runtime benefits, but
they come with unique drawbacks. Ideally, different objects would hash to different hash values. But the
combinatorial explosion of possibilities for unique strings or collections of complex data means that a
collision, or a situation in which multiple objects hash to the same integer index value, is inevitable since
integers in a computer can typically only represent a certain, fixed range of integer numbers. Combinatorial
explosion is not only a problem for the design of efficient algorithms, but also the design of efficient data
structures too.

Graph Problems
While data structure problems focus primarily on storage and retrieval of elements in a collection, graph
problems include a wide variety of computing problems involving the graph data structure. Unlike other data
structures, graphs include not only elements (vertices) but also relationships between elements (edges). Graph
problems often ask algorithm designers to explore the graph in order to answer questions about elements
and the relationships between elements.

Traversal
Traversal is the problem of exploring all the vertices in a graph. Graph data structures differ from tree data
structures in that there is no explicit root node to begin the traversal and edges can connect back to other
parts of the graph. In general, there is no guarantee of hierarchy in a graph. Graph traversal algorithms such
as depth-first search and breadth-first search begin at an arbitrary start vertex and explore outwards from the
start vertex while keeping track of a set of explored vertices.

Depth-First Search

A depth-first search is a graph traversal algorithm that recursively explores each neighbor, continuing as far
possible along each subproblem depth-first (Figure 3.26). Explored vertices are added to a global set to ensure
that the algorithm only explores each vertex once. The runtime of depth-first search is in O(|V| + |E|) with
respect to |V|, the number of vertices, and |E|, the number of edges.
124 3 • Data Structures and Algorithms

Figure 3.26 Depth-first search is a graph traversal algorithm that continues as far down a path as possible from a start vertex before
backtracking. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Breadth-First Search

A breadth-first search iteratively explores each neighbor, expanding the search level-by-level breadth-first
(Figure 3.27). Explored vertices are also added to a global set to ensure that the algorithm explores each vertex
once and in the correct level-order. The runtime of breadth-first search is also in O(|V| + |E|) with respect to
|V|, the number of vertices, and |E|, the number of edges.

Graph traversal algorithms are the foundational algorithm design patterns for most graph processing
algorithms. Many algorithms require some amount of exploration, and that exploration typically starts at some
vertex and continues processing each reachable vertex at most once. A reachable vertex can be reached if a
path or sequence of edges from the start vertex exists. As opposed to depth-first search, breadth-first search
has the benefit of exploring vertices closer to the start before exploring vertices further from the start, which
can be useful for solving problems such as unweighted shortest paths.

Figure 3.27 Depth-first search is a graph traversal algorithm that continues as far down a path as possible from a start vertex before
backtracking. Breadth-first search is a graph traversal algorithm that explores level by level expanding outward from the start vertex.
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Minimum Spanning Trees


Minimum spanning trees is the problem of finding a lowest-cost way to connect all the vertices to each other,

Access for free at openstax.org


3.5 • Sample Algorithms by Problem 125

where cost is the sum of the selected edge weights. The two canonical greedy algorithms for finding a
minimum spanning tree in a graph are Kruskal’s algorithm and Prim’s algorithm. Both algorithms repeatedly
apply the rule of selecting the next lowest-weight edge to an unconnected part of the graph. The output of a
minimum spanning tree algorithm is a set of |V| - 1 edges connecting all the vertices in the graph with the
least total sum of edge weights, where |V| is the number of vertices.

Kruskal’s Algorithm

Kruskal’s algorithm begins by considering each vertex as an independent "island," and the goal of the
algorithm is to repeatedly connect islands by selecting the lowest-cost edges. A specialized data structure
(called disjoint sets) is typically used to keep track of the independent islands. The runtime of Kruskal’s
algorithm is in O(|E| log |E|) with respect to |E|, the number of edges (Figure 3.28).

Figure 3.28 Kruskal’s algorithm repeatedly selects the next lightest edge that connects two independent "islands." (attribution:
Copyright Rice University, OpenStax, under CC BY 4.0 license)

Prim’s Algorithm

Prim’s algorithm grows a minimum spanning tree one edge at a time by selecting the lowest-weight edge to
an unexplored vertex. The runtime of Prim’s algorithm is in O(|E| log |V| + |V| log |V|) with respect to |V|,
the number of vertices, and |E|, the number of edges (Figure 3.29).

Figure 3.29 Prim’s algorithm expands outward from the start vertex by repeatedly selecting the next lightest edge to an unreached
vertex. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Shortest Paths
The output of a shortest paths algorithm is a shortest paths tree, the lowest-cost way to get from one vertex
to every other vertex in a graph (Figure 3.30). The unweighted shortest path is the problem of finding the
shortest paths in terms of the number of edges. Given a start vertex, breadth-first search can compute the
unweighted shortest paths tree from the start vertex to every other reachable vertex in the graph by
maintaining a map data structure of the path used to reach each vertex.
126 3 • Data Structures and Algorithms

Figure 3.30 Three shortest paths trees of the lowest-cost way to get from the start vertex to every other vertex in the graph are
shown. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Weighted Shortest Path

A weighted shortest path is the problem of finding the shortest paths in terms of the sum of the edge
weights. Earlier, we introduced Prim’s algorithm, a minimum spanning tree algorithm that maintains a priority
queue of edges in the graph ordered by weight and repeatedly selects the next lowest-cost edge to an
unconnected part of the graph.

THINK IT THROUGH

Weighted Shortest Paths for Navigation Directions


One of the most direct applications of the shortest paths problem is to provide recommended routes for
navigation directions in real-world mapping. Many applications use distance as the metric for edge weight,
so the shortest path between two points represents the real-world route with the smallest distance
between the two places.

What does a distance metric not consider when providing a recommended route? What values are centered
and emphasized by even using a shortest paths algorithm for recommending routes?

Dijkstra’s Algorithm

Dijkstra’s algorithm maintains a priority queue of vertices in the graph ordered by distance from the start
and repeatedly selects the next shortest path to an unconnected part of the graph. Dijkstra’s algorithm is
almost identical to Prim’s algorithm except processing shortest paths (sequences of edges) rather than
individual edges. Dijkstra’s algorithm grows a shortest paths tree one shortest path at a time by selecting the
next shortest path to an unexplored vertex. The runtime of Dijkstra’s algorithm is in O(|E| log |V| + |V| log
|V|) with respect to |V|, the number of vertices, and |E|, the number of edges (Figure 3.31).

Figure 3.31 Dijkstra’s algorithm expands outward from the start vertex by repeatedly selecting the next lowest-cost path to an
unreached vertex. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Access for free at openstax.org


3.6 • Computer Science Theory 127

3.6 Computer Science Theory


Learning Objectives
By the end of this section, you will be able to:
• Understand the models and limits of computing
• Relate Turing machines to algorithms
• Describe complexity classes
• Interpret NP-completeness
• Differentiate between P and NP

Throughout this chapter, we have introduced several techniques and canonical case studies for the design and
analysis of algorithms—oftentimes focusing on the ideas and details behind individual algorithms. But the
study of algorithms is more than just the study of individual algorithms, algorithm design, or even algorithm
analysis.

Models of Computation
Computers include basic algorithms for solving problems like adding, subtracting, or comparing two numbers.
Computers, owing to their roots in calculators, are optimized to solve these problems; these basic algorithms
are constant-time operations. What programming offers is the ability to define our own algorithms that can be
used to solve more complex problems, such as searching, sorting, and hashing. Unfortunately, these
programmed algorithms are not as fast as basic operations. We have even seen certain problems deal with a
combinatorial explosion in the number of potential solutions. For many of these problems, the best-known
algorithms do not do much better than brute-force, which takes exponential time.

In the bigger picture, computer science engages the central question of how humans can encode intelligence.
Our discussion of algorithm design grounded the activity in problem modeling, the process of encoding a
complex real-world phenomenon or problem in a more abstract or simple form. How is problem modeling
constrained by the model of computation, or the rules of the underlying computer that executes an algorithm?
Why are certain problems challenging for computers to execute?

Combinatorial explosion poses a problem for computer algorithms because our model of computation
assumes computers only have a single thread of execution and only execute one basic operation on each step.
If we overturn some part of this assumption, either by creating computers with multiple processors or by
creating more sophisticated operations, then it might be possible to deal with combinatorial explosion. Almost
all of today’s computer hardware, ranging from massive supercomputers to handheld smartphones, rely at
least to some degree on expanding the model of computation to compute solutions to problems more
efficiently. Even so, much of today’s computer hardware still relies on the same fundamental programming
assumptions: that there are variables to represent data and arithmetic or logical operations.

Turing Machines
In the 1800s, Charles Babbage imagined a mechanical machine—the Analytical Engine—that could
automatically calculate mathematical formulas. Ada Lovelace then extrapolated that the Analytical Engine
could solve more general algorithms by using loops to repeat processes and variables to represent data.
Lovelace’s vision of algorithms represented a synthesis between human intuition and mathematical reasoning.
In the mid-1900s, Lovelace’s ideas inspired Alan Turing to imagine a more general notion of algorithms and
machines that could run those algorithms. A Turing machine is an abstract model of computation for
executing any computer algorithm. A Turing machine describes computers in terms of three key ideas:

1. a memory bank for storing data.


2. an instruction table, where each instruction can either:
a. store a value to the memory bank.
128 3 • Data Structures and Algorithms

b. retrieve a value from the memory bank.


c. perform a basic operation on a value.
d. set which instruction will be executed next by modifying the program counter.

3. a program counter that keeps track of the current instruction in the instruction table.

A Turing machine executes a computer algorithm by following each instruction specified by the program
counter. An algorithm can use these basic operations to compute the sum of 1 and 1.

1. Store the value 1 to address A in the memory bank.


2. Store the value 1 to address B in the memory bank.
3. Add the values at addresses A and B and then store the result at address A.

What makes computers useful is not just the fact that they can calculate numbers, but that they can encode
logic in the instructions. Instead of just computing the sum of 1 and 1, this program continues adding 1 to a
growing sum stored at address A.

1. Store the value 1 to address A in the memory bank.


2. Store the value 1 to address B in the memory bank.
3. Add the values at addresses A and B and then store the result at address A.
4. Set the program counter to execute step 3 next.

The Turing machine abstract model of computation assumes a single thread of execution following each
instruction in an algorithm. Although today’s computers are much more efficient than the first computers that
realized the Turing machine, most computers still rely on the same fundamental assumptions about how to
execute algorithms. The O(N)-time sequential search algorithm, though it might execute 1,000 times faster on
today’s computers, still grows linearly with respect to the size of the input. An O(2N)-time brute-force
algorithm, though it might execute 1,000 times faster on today’s computers, still grows exponentially with
respect to the size of the input. Even as computers become faster over time, inefficient algorithms still cannot
be used to solve any problems larger than a few thousand elements.

Complexity Classes
One subfield of computer science is theoretical computer science, which studies models of computation, their
application to algorithms, and the complexity of problems. The complexity of a problem is the complexity (in
terms of time or memory resources required) of the most efficient algorithms for solving the problem.
Theoretical computer scientists are interested in understanding the difficulty of a problem in terms of time
complexity, space complexity, and some other complexity measures.

In this chapter, we have focused on solving problems known to have polynomial time algorithms. Searching,
sorting, hashing, traversal, minimum spanning trees, or shortest paths are all examples of problems in the
polynomial (P) time complexity class because they are all problems that have runtimes with a polynomial
expression such as O(1), O(log N), O(N), O(N log N), O(N2), O(N3). In general, these problems are considered
tractable because computers can solve them in a reasonable amount of time. But there are many problems
that are considered intractable because they do not have efficient, polynomial-time algorithms.

The nondeterministic polynomial (NP) time complexity class refers to all problems that can be solved in
polynomial time by a nondeterministic algorithm. A nondeterministic algorithm is a special kind of Turing
machine, which at each step can nondeterministically choose which instruction to execute, and is considered
to successfully find a solution if any combination of these nondeterministic choices eventually lead to a correct
solution. In other words, in contrast to a deterministic algorithm, such as a greedy algorithm, which must
repeatedly apply a simple rule to deterministically select the next element in a solution, a nondeterministic
algorithm is able to simultaneously explore all the possible choices. We do not yet have computers that can
execute nondeterministic algorithms, but if we did, then we would be able to efficiently solve any
combinatorial problem by relying on the special power of nondeterminism.

Access for free at openstax.org


3.6 • Computer Science Theory 129

Technically, all P problems are also NP problems because we already have deterministic algorithms for solving
them and therefore do not need to rely on the special power of nondeterminism. For example, Dijkstra’s
algorithm provides a deterministic polynomial-time solution to the shortest paths problem by building up a
shortest paths tree from the start vertex outward. This application of the greedy algorithmic paradigm relies
on the structure of the shortest paths tree, since the shortest path to a point further away from the start must
build on the shortest path to a point closer to the start.

NP-complete Problems
NP-complete refers to all the hardest NP problems—the combinatorial problems for which we do not have
deterministic polynomial-time algorithms. More precisely, a problem PI is said to be NP-complete if PI is in NP
and for every problem in NP, there is a reduction that reduces the problem to PI. For example, a longest path,
or the problem of finding the highest-cost way to get from one vertex to another without repeating vertices, is
an NP-complete problem opposite to shortest paths (Figure 3.32). What makes longest paths so much harder
to solve than shortest paths? For one, there is no underlying structure to the solution that we can use to
repeatedly apply a simple rule as in a greedy algorithm. With the shortest paths problem, we could rely on the
shortest paths tree to inform the solution. But in longest paths, the goal is to wander around the graph. The
longest path between any two vertices will probably involve traversing as many edges as possible to maximize
distance, visiting many vertices along the way. For some graphs, the longest paths might even visit all the
vertices in the graph. In this situation, the longest paths do not form a tree and instead involve ordering all the
vertices in the graph for each longest path. Identifying the correct longest path then requires listing out all the
possible paths in the graph—a combinatorial explosion in the combinations of edges and vertices that can be
selected to form a solution.

Figure 3.32 The longest path in a graph maximizes the distance, which often (but not always) involves visiting many vertices along
the way. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

Related to the problem of longest paths is the traveling salesperson problem (TSP), which is the problem of,
given the path cost of every pair of vertices, finding the lowest-cost tour, or path from a start vertex visiting
every vertex in the graph once including a return to the start vertex. Compared to the TSP, which is finding the
lowest-cost tour, the longest paths problem is like finding the highest-cost tour. What makes both these
problems difficult is that we do not have a simple rule for selecting the next element to include in the solution.
Unless we have the special power of nondeterminism, it is hard to tell from the beginning which edge is the
right one to include in the final solution. Applying a simple rule like, “Select the edge with the lowest cost,”
might not necessarily lead to the overall lowest-cost tour. Even though this simple rule worked for the problem
of minimum spanning trees, the additional restriction of a tour rather than a tree makes the TSP a much
harder problem to solve efficiently. Although we have efficient algorithms for shortest paths, we do not have
efficient algorithms for the shortest tour (TSP).
130 3 • Data Structures and Algorithms

INDUSTRY SPOTLIGHT

Delivery Logistics
Companies such as Amazon, FedEx, UPS, and others that rely on logistics to deliver goods to various
locations seek to optimize the sequence of stops to save costs. The only way to achieve this would be to rely
on an optimal algorithm for the traveling salesperson problem. The TSP aims to find the minimum distance
tour that visits all the vertices such that no vertex is visited more than once. But this is not a perfect match
for real-world delivery logistics. Fuel or battery efficiency, for example, is not just about distance traveled,
but also the speed of travel, the time spent idling, and even the way that the route is organized. In the
United States, where vehicles drive on the right side of the road, safety can be improved by reducing the
number of left-hand turns across the divider. Drivers might also want to take breaks during a trip. Modeling
all these factors requires carefully formulating the problem and considering the limits of TSP.

How do we know if a problem is NP-complete? Earlier, we introduced reduction as an algorithm paradigm that
is not only useful for solving problems, but also for relating the difficulty of problems. A reduction from a
difficult problem A to another problem B proves that B is as difficult as A. It turns out that all NP-complete
problems can be reduced to all the others, so an algorithm for solving any NP-complete problem solves every
NP-complete problem.

LINK TO LEARNING

The longest paths problem and the traveling salesperson problem are just two examples of NP-complete
problems. Visit A Graph of NP-Complete Reductions (https://openstax.org/r/76NPCompReuct) to visualize
many NP-complete problems and their relationships to each other. For example, the longest paths problem
(LPT) and the traveling salesperson problem (TS) both reduce to Hamiltonian paths (HP). In turn,
Hamiltonian paths (HP) reduce to vertex cover (VC) which reduces to 3-satisfiability (3-SAT).

P versus NP
Longest paths and TSP are just two among thousands of NP-complete problems for which we do not have
efficient algorithms. The question of P versus NP asks whether it is possible to design a deterministic
polynomial-time algorithm for solving any—and therefore all—of NP-complete problems. There are two
possible answers to the question of P versus NP:

• If P = NP, then there is a deterministic polynomial-time algorithm for solving all NP-complete problems.
• If P ≠ NP, then there are NP-complete problems that cannot be solved with a deterministic polynomial-
time algorithm.

Most theoretical computer scientists believe that P ≠ NP; in other words, it is impossible to design a
deterministic polynomial-time algorithm for longest paths, TSP, or any other NP-complete problem. However,
they do not have any definite proof that P = NP or P ≠ NP. An efficient algorithm for any one NP-complete
problem would not only directly solve routing and logistics problems but would also enable massive
advancements in drug discovery through scientific simulation, for instance. It would also break essentially all
modern Internet security and password systems—among thousands of other problems.

Access for free at openstax.org


3 • Chapter Review 131

Chapter Review
Key Terms
abstract data type (ADT) consists of all data structures that share common functionality
abstraction process of simplifying a concept in order to represent it in a computer
adjacent in a graph abstract data type, the relationship between two vertices connected by an edge
algorithm analysis study of the results produced by the outputs as well as how the algorithm produces
those outputs
algorithm design pattern solution to well-known computing problems
algorithmic paradigm common concept and ideas behind algorithm design patterns
algorithmic problem solving refers to a particular set of approaches and methods for designing algorithms
that draws on computing’s historical connections to the study of mathematical problem solving
array list data structure that stores elements next to each other in memory
asymptotic analysis evaluates the time that an algorithm takes to produce a result as the size of the input
increases
asymptotic notation mathematical notation that formally defines the order of growth
AVL tree balanced binary search tree data structure often used to implement sets or maps that organizes
elements according to the AVL tree property
AVL tree property requires the left and right subtrees to be balanced at every node in the tree
balanced binary search tree introduces additional properties that ensure that the tree will never enter a
worst-case situation by reorganizing elements to maintain balance
Big O notation most common type of asymptotic notation in computer science used to measure worst case
complexity
binary heap binary tree data structure used to implement priority queues that organizes elements according
to the heap property
binary logarithm tells how many times a large number needs to be divided by 2 until it reaches 1
binary search algorithm recursively narrows down the possible locations for the target in the sorted list
binary search tree tree data structure often used to implement sets and maps that organizes elements
according to the binary tree property and the search tree property
binary tree property requires that each node can have either zero, one, or two children
breadth-first search iteratively explores each neighbor, expanding the search level-by-level breadth-first
brute-force algorithm solves combinatorial problems by systematically enumerating all potential solutions
in order to identify the best candidate solution
canonical algorithm well-known algorithm
case analysis way to account for variation in runtime based on factors other than the size of the problem
child node descendant of another node
collision situation where multiple objects hash to the same integer index value
combinatorial explosion exponential number of solutions to a combinatorial problem that makes brute-
force algorithms unusable in practice
combinatorial problem involves identifying the best candidate solution out of a space of many potential
solutions
comparison sorting sorting of a list of elements where elements are not assigned numeric values but rather
defined in relation to other elements
complexity condition based on the degree of computational resources that an algorithm consumes during
its execution in relation to the size of the input
compression problem of representing information using less data storage
constant type of order of growth that does not take more resources as the size of the problem increases
correctness whether the outputs produced by an algorithm match the expected or desired results across the
range of possible inputs
132 3 • Chapter Review

cost model characterization of runtime in terms of more abstract operations such as the number of
repetitions
count sorting sorting a list of elements by organizing elements into categories and rearranging the
categories into a logical order
cryptography problem of masking or obfuscating text to make it unintelligible
data structure complex data type with specific representation and specific functionality
data structure problem computational problem involving the storage and retrieval of elements for
implementing abstract data types such as lists, sets, maps, and priority queues
data type determines how computers process data by defining the possible values for data and the possible
functionality or operations on that data
depth-first search graph traversal algorithm that recursively explores each neighbor, continuing as far
possible along each subproblem depth-first
Dijkstra’s algorithm maintains a priority queue of vertices in the graph ordered by distance from the start
and repeatedly selects the next shortest path to an unconnected part of the graph
divide and conquer algorithm algorithmic paradigm that breaks down a problem into smaller subproblems
(divide), recursively solves each subproblem (conquer), and then combines the result of each subproblem in
order to inform the overall solution
edge relationship between vertices or nodes
element individual data point
experimental analysis evaluates an algorithm’s runtime by recording how long it takes to run a program
implementation of it
functionality operations such as adding, retrieving, and removing elements
graph represents binary relations among collection of entities, specifically vertices and edges
graph problem computational problem involving graphs that represent relationships between data
greedy algorithm solves combinatorial problems by repeatedly applying a simple rule to select the next
element to include in the solution
hash table implements sets and maps by applying the concept of hashing
hashing problem of assigning a meaningful integer index for each object
heap property requires that the priority value of each node in the heap is greater than or equal to the
priority values of its children
heapsort algorithm adds all elements to a binary heap priority queue data structure using the comparison
operation to determine priority value and returns the sorted list by repeatedly removing from the priority
queue element-by-element
index position or address for an element in a list
interval scheduling problem combinatorial problem involving a list of scheduled tasks with the goal of
finding the largest non-overlapping set of tasks that can be completed
intractable problems that do not have efficient, polynomial-time algorithms
Kruskal’s algorithm greedy algorithm that sorts the list of edges in the graph by weight
leaf node node at the bottom of a tree that has no children
linear type of order of growth where the resources required to run the algorithm increases at about the
same rate as the size of the problem increases
linear data structure category of data structures where elements are ordered in a line
linked list data structure that does not necessarily store elements next to each other and instead works by
maintaining, for each element, a link to the next element in the list
list ordered sequence of elements and allows adding, retrieving, and removing elements from any position in
the list
logarithm tells how many times a large number needs to be divided by a small number until it reaches 1
longest path problem of finding the highest-cost way to get from one vertex to another without repeating
vertices
map represents unordered associations between key-value pairs of elements, where each key can only

Access for free at openstax.org


3 • Chapter Review 133

appear once in the map


matching problem of searching for a text pattern within a document
merge sort canonical divide and conquer algorithm for comparison sorting
minimum spanning tree problem of finding a lowest-cost way to connect all the vertices to each other
model of computation rules of the underlying computer that is ultimately responsible for executing the
algorithm
node represents an element in a tree or graph
nondeterministic algorithm special kind of Turing machine, which at each step can non-deterministically
choose which instruction to execute and is considered to successfully find a solution if any combination of
these nondeterministic choices leads to a correct solution
nondeterministic polynomial (NP) time complexity class all problems that can be solved in polynomial
time by a nondeterministic algorithm
NP-complete all the hardest NP problems—the combinatorial problems for which we do not have
deterministic polynomial-time algorithms
order of growth geometric prediction of an algorithm’s time or space complexity as a function of the size of
the problem
perfectly balanced for every node in the binary search tree, its left and right subtrees contain the same
number of elements
polynomial (P) time complexity class all problems that have runtimes described with a polynomial
expression such as O(1), O(log N), O(N), O(N log N), O(N2), O(N3)
Prim’s algorithm greedy algorithm that maintains a priority queue of vertices in the graph ordered by
connecting edge weight
priority queue represents a collection of elements where each element has an associated priority value
problem task with specific input data and output data corresponding to each input
problem model simplified, abstract representation of a more complex real-world problem
program realization or implementation of an algorithm written in a formal programming language
quicksort algorithm recursively sorts data by applying the binary search tree algorithm design pattern to
partition data around pivot elements
reachable vertex vertex that can be reached if a path or sequence of edges from the start vertex exists
reduction algorithm solves problems by transforming them into other problems
representation particular way of organizing a collection of elements
root node node at the top of the tree
runtime analysis study of how much time it takes to run an algorithm
search tree property requires that elements in the tree are organized least-to-greatest from left-to-right
searching problem of retrieving a target element from a collection of elements
sequential search algorithm searching algorithm that sequentially checks the collection element-by-
element for the target
set represents an unordered collection of unique elements and allows adding, retrieving, and removing
elements from the set
shortest path problem of finding a lowest-cost way to get from one vertex to another
shortest paths tree output of the shortest paths problem, the lowest-cost way to get from one vertex to
every other reachable vertex in a graph
sorting problem of rearranging elements into a logical order
space complexity formal measure of how much memory an algorithm requires during its execution as it
relates to the size of the problem
step basic operation in the computer, such as looking up a single value, adding two values, or comparing two
values
string problem computational problem involving text or information represented as a sequence of
characters
subproblem smaller instance of a problem that can be solved independently
134 3 • Chapter Review

time complexity formal measure of how much time an algorithm requires during execution as it relates to
the size of the problem
tractable problems that computers can solve in a reasonable amount of time
traveling salesperson problem (TSP) problem of, given the path cost of every pair of vertices, finding the
lowest-cost tour, or path from a start vertex visiting every vertex in the graph once including a return to the
start vertex
traversal problem of exploring all the vertices in a graph
tree hierarchical data structure
Turing machine abstract model of computation for executing any computer algorithm
unweighted shortest path problem of finding the shortest paths in terms of the number of edges
vertex represents an element in a graph or special type of it such as a tree
weighted shortest path problem of finding the shortest paths in terms of the sum of the edge weights

Summary
3.1 Introduction to Data Structures and Algorithms
• Data structures represent complex data types for solving real-world problems. Data structures combine
specific data representations with specific functionality.
• Abstract data types categorize data structures according to their functionality and ignore differences in
data representation. Abstract data types include lists, sets, maps, priority queues, and graphs.
• To select an appropriate data structure, first select an abstract data type according to the problem
requirements. Then, select an appropriate data structure implementation for the abstract data type.
• Linear data structures organize elements in a line, ideal for implementing the list abstract data type.
Linear data structures include array lists and linked lists.
• Linear data structures can implement any abstract data type. The study of data structures in general
focuses on opportunities to improve efficiency (in terms of execution time or memory usage) over linear
data structures.
• Tree data structures organize elements in a hierarchy of levels defined by parent-child relationships. Trees
are defined with a root node at the top of the tree, parent-child relationships between each level, and leaf
nodes at the bottom of the tree.
• Binary search trees require that elements in the tree are organized least-to-greatest from left-to-right.
Binary search trees are often used to implement the set and map abstract data types.
• Balanced binary search trees and binary heaps represent two approaches for avoiding the worst-case
situation with binary search trees. Binary heaps are often used to implement the priority queue abstract
data type.
• Graph data structures focus on explicitly modeling the relationships between elements. Graphs afford
access not only to elements, but also to the relationships between elements.

3.2 Algorithm Design and Discovery


• Just like how many data structures can represent the same abstract data type, many algorithms exist to
solve the same problem. In algorithmic problem-solving, computer scientists solve formal problems with
specific input data and output data that correspond to each input.
• Modeling is the process of representing a complex phenomenon such as a real-world problem as a formal
problem. Modeling is about abstraction: the simplification or erasure of details so that the problem can be
solved by a computer.
• Historically, the model of computation emphasized specialized algorithms operating on a modest model
of the underlying phenomenon. Modeling is a violent but also necessary act in order to simplify the
problem so that it can be solved by a computer.
• Searching is the problem of retrieving a target element from a collection of many elements. Sequential
search and binary search are two algorithms for solving the search problem.
• To solve real-world problems, computer scientists compose, modify, and apply algorithm design patterns,

Access for free at openstax.org


3 • Chapter Review 135

such as search algorithms.


• Algorithm analysis is the study of the outputs produced by an algorithm as well as how the algorithm
produces those outputs.
• Correctness considers whether the outputs produced by an algorithm match the expected or desired
results across the range of possible inputs. Correctness is defined as a match between the algorithm and
the model of the problem, not between the algorithm and the real-world.
• Correctness is complicated by the complexity of social relationships, power, and inequity in the real-world.
Since algorithms automate processes and operate in existing power structures, they are likely to
reproduce and amplify social injustice.
• In addition to correctness, computer scientists are also interested in complexity, or measuring the
computational resources that an algorithm consumes during its execution in relation to the size of the
input.

3.3 Formal Properties of Algorithms


• Runtime analysis is a study of how much time it takes to run an algorithm. Experimental analysis is a
runtime analysis technique that involves evaluating an algorithm’s runtime by recording how long it takes
to run a program implementation of it.
• Time complexity is the formal measure of how much time an algorithm requires during execution as it
relates to the size of the problem. The goal of time complexity analysis is to produce a simple and easy-to-
compare characterization of the runtime of an algorithm as it relates to the size of the problem.
• Space complexity is the formal measure of how much memory an algorithm requires during execution as
it relates to the size of the problem.
• Steps in time complexity analysis are to identify a metric for representing the size of the problem; to
model the number of steps needed to execute the algorithm; and to formalize the model using either
precise English or asymptotic notation to define the order of growth. Big O notation is the most common
type of asymptotic notation in computer science.
• Differences in orders of growth are massive: as the input size grows, the difference between orders of
growth becomes more and more vast. For problems dealing with just 1,000 elements, the time it would
take to run an exponential-time algorithm on that problem exceeds the current age of the
universe—whereas that same-size problem running on the same computer would take just 1 second on a
quadratic-time algorithm.
• In practice, across applications working with large amounts of data, O(N2) is often considered the limit for
real-world algorithms. For algorithms that need to run frequently on large amounts of data, algorithm
designers target O(N), O(log N), or O(1).

3.4 Algorithmic Paradigms


• Algorithmic paradigms are the common concepts and ideas behind algorithm design patterns, such as
divide and conquer algorithms, brute-force algorithms, greedy algorithms, and reduction algorithms.
• Divide and conquer algorithms break down a problem into smaller subproblems (divide), recursively solve
each subproblem (conquer), and then combine the result of each subproblem to inform the overall
solution. Recursion is an algorithm idea fundamental to divide and conquer algorithms that solves
complex problems by dividing input data into smaller, independent instances of the same problem known
as subproblems.
• Binary search is an example of divide and conquer algorithm with a single recursive subproblem. Merge
sort is an example of a divide and conquer algorithm with two recursive subproblems.
• Brute-force algorithms solve combinatorial problems by systematically enumerating all potential solutions
in order to identify the best candidate solution. Combinatorial problems identify the best candidate
solution out of a space of many potential solutions.
• Brute-force algorithms exist for every combinatorial problem, but they are not typically used in practice
because of long run time issues. To enumerate all potential solutions, a brute-force algorithm must
generate every possible combination of the input data.
136 3 • Chapter Review

• Greedy algorithms solve combinatorial problems by repeatedly applying a simple rule to select the next
element to include in the solution. Unlike brute-force algorithms that solve combinatorial problems by
generating all potential solutions, greedy algorithms instead focus on generating just one solution.
• Greedy algorithms are not always guaranteed to compute the best solution depending on the
assumptions and goals of the problem. A greedy algorithm for the interval scheduling problem will not
compute the correct result if we choose to complete the shortest tasks.
• Kruskal’s algorithm and Prim’s algorithm are two examples of greedy algorithms for the minimum
spanning trees problem. These algorithms are a rare example of a greedy algorithm that is guaranteed to
compute the correct result.
• Reduction algorithms solve problems by transforming them into other problems. In other words,
reduction algorithms delegate most of the work of solving the problem to another algorithm meant for a
different problem.
• Reduction algorithms allow algorithm designers to rely on optimized canonical algorithms rather than
designing a solution by composing algorithm design patterns, which can lead to performance or
correctness bugs. Reduction algorithms also enable computer scientists to make claims about the relative
difficulty of a problem.

3.5 Sample Algorithms by Problem


• Data structure problems focus on the storage and retrieval of elements for implementing abstract data
types such as lists, sets, maps, and priority queues. Data structure problems include sorting, searching,
and hashing.
• Searching is the problem of retrieving a target element from a collection of elements. Searching in a linear
data structure such as an array list can be done using either sequential search or binary search.
• Sorting is the problem of rearranging elements into a logical order, typically from least-valued (smallest) to
greatest-valued (largest). Sorting is a fundamental problem not only because of the tasks that it directly
solves, but also because it is a foundation for many other algorithms such as the binary search algorithm
or Kruskal’s algorithm for the minimum spanning tree problem.
• Merge sort and quicksort are two examples of divide and conquer algorithms for sorting. Heapsort is a
sorting algorithm that relies on adding to a heap and then repeatedly removing each element in sorted
order.
• Hashing is the problem of assigning a meaningful integer index (hash value) for each object. Hash tables
are a data structure for implementing sets and maps by applying the concept of hashing.
• Graph problems include a wide variety of problems involving the graph data type. Graph problems include
traversal, minimum spanning trees, and shortest paths.
• Traversal is the problem of exploring all the vertices in a graph. Depth-first search and breadth-first search
are both graph traversal algorithms that expand outward from a start vertex, ultimately visiting every
reachable vertex.
• Minimum spanning trees is the problem of finding a lowest-cost way to connect all the vertices to each
other, where cost is the sum of the selected edge weights. The two canonical greedy algorithms for finding
a minimum spanning tree in a graph are Kruskal’s algorithm and Prim’s algorithm.
• Shortest paths is the problem of finding a lowest-cost way to get from one vertex to another. The output of
a shortest paths algorithm is a shortest paths tree from the start vertex to every other vertex in the graph.
• Breadth-first search computes the unweighted shortest paths tree, the shortest paths in terms of the
number of edges. Dijkstra’s algorithm computes the weighted shortest paths tree, the shortest paths in
terms of the sum of the edge weights.

3.6 Computer Science Theory


• Problem modeling is constrained by the model of computation, or the rules of the underlying computer
that is ultimately responsible for executing the algorithm. Combinatorial explosion poses a problem for
computer algorithms because our model of computation assumes computers only have a single thread of
execution and only execute one basic operation on each step.

Access for free at openstax.org


3 • Chapter Review 137

• A Turing machine is an abstract model of computation for executing any computer algorithm. A Turing
machine describes computers in terms of three key ideas: a memory bank, an instruction table, and a
program counter.
• Although today’s computers are much more efficient than the first computers that realized the Turing
machine, most computers still rely on the same fundamental assumptions about how to execute
algorithms. Even as computers become faster over time inefficient algorithms still cannot be used to solve
any problems larger than a few thousand elements.
• The complexity of a problem is the complexity (i.e., the time or memory resources required) of the most
efficient algorithms for solving the problem. In this chapter, we have focused on solving problems known
to have polynomial time algorithms that can be described with a polynomial expression such as O(1),
O(log N), O(N), O(N log N), O(N2), O(N3).
• Nondeterministic polynomial (NP) time complexity class refers to all problems that can be solved in
polynomial time by a nondeterministic algorithm. A nondeterministic algorithm is a kind of algorithm that
can rely on the special power of exploring infinitely many possible “alternate universes” in order to
complete a computation.
• Technically, all P problems are also NP problems because we already have deterministic algorithms for
solving them and therefore do not need to rely on the special power of nondeterminism. NP-complete
refers to all the hardest NP problems—the combinatorial problems for which we do not have deterministic
polynomial-time algorithms.
• Longest paths and the traveling salesperson problem (TSP) are two well-known examples of NP-complete
problems. What makes both these problems difficult is that we do not have a simple rule for selecting the
next element to include in the solution.
• All NP-complete problems can be reduced to all the others, so an algorithm for solving any NP-complete
problem solves every NP-complete problem. The question of P versus NP asks whether it is possible to
design a deterministic polynomial-time algorithm for solving any—and therefore all—of these NP-
complete problems.
• Most theoretical computer scientists believe that it is impossible to design an efficient algorithm for
longest paths, TSP, or any other NP-complete problems. An efficient algorithm for any one NP-complete
problems would not only directly solve routing and logistics problems but would also enable massive
advancements in drug discovery through scientific simulation, for instance. It would also break essentially
all modern Internet security and password systems—among thousands of other problems.

Review Questions
1. Why did we introduce tree data structures as an alternative to linear data structures?
a. Some complex data can only be represented with a tree data structure.
b. Some simple data can only be represented with a tree data structure.
c. Linear data structures cannot implement sets and maps.
d. Tree data structures are typically more effective at implementing sets and maps.

2. How does the graph abstract data type differ from other abstract data types?
a. It can model the relationships between elements.
b. It is more efficient than other abstract data types.
c. It can solve problems that other abstract data types cannot solve.
d. It does not specify a particular data structure implementation.

3. What abstract data type do binary heaps most commonly implement?


a. lists
b. sets
c. maps
138 3 • Chapter Review

d. priority queues

4. What is one way to describe the relationship between algorithms, problems, and modeling?
a. Algorithms are the foundation for problem models.
b. Algorithms solve a model of a problem.
c. Each algorithm can only be used to solve a single problem.
d. Each problem can only have a single model.

5. At what point do computer scientists apply algorithm design patterns?


a. when learning canonical algorithms
b. when modeling a problem
c. when solving new problems
d. when analyzing an algorithm

6. How does the model of computation relate to the problem model?


a. The model of computation is synonymous with problem model.
b. Problem models constrain the model of computation.
c. The model of computation constrains the problem modeling process.
d. The model of computation describes a single algorithm for each problem model.

7. Why is case analysis important?


a. Case analysis provides an alternative to asymptotic analysis.
b. Case analysis focuses on small inputs.
c. Case analysis simplifies the step-counting by introducing a cost model.
d. Case analysis considers factors other than the size of the problem.

8. What is true about the order of growth of binary search with respect to the size of the sorted list?
a. In the best case, the order of growth of binary search is constant.
b. In the best case, the order of growth of binary search is logarithmic.
c. In the worst case, the order of growth of binary search is constant.
d. In the worst case, the order of growth of binary search is linear.

9. How does time complexity relate to space complexity?


a. Time complexity measures efficiency according to the size of the problem, while space complexity
does not.
b. Both time and space complexity measure the efficiency of algorithms as they relate to the nature of
the problem.
c. Time complexity focuses on asymptotic analysis while space complexity focuses on experimental
analysis.
d. Both time and space complexity can apply methods from asymptotic analysis and experimental
analysis.

10. What are the three steps in divide and conquer algorithms?

11. Why do many greedy algorithms fail to compute the best solution to a combinatorial problem?

12. What are the three steps in reduction algorithms?

13. What is quicksort’s algorithm design pattern and algorithmic paradigm?


a. Quicksort is an application of the binary search tree algorithm design pattern and an example of
the divide and conquer algorithmic paradigm.

Access for free at openstax.org


3 • Chapter Review 139

b. Quicksort is an application of the binary search tree algorithm design pattern and an example of
the brute-force algorithmic paradigm.
c. Quicksort is an application of the binary search tree algorithm design pattern and an example of
the greedy algorithmic paradigm.
d. Quicksort is an application of the binary search tree algorithm design pattern and an example of
the randomized incremental construction algorithmic paradigm.

14. What graph problems can breadth-first search solve?


a. exponential node tree
b. minimum spanning trees
c. unweighted shortest paths
d. weighted shortest paths

15. What is a primary drawback of hashing?


a. There can be collisions as the same element can hash to multiple values.
b. There can be collisions between multiple elements that hash to the same value.
c. Hashing is slower than binary search for search problems.
d. Hashing is faster than binary search for search problems.

16. What is the relationship between Turing machines and models of computation?

17. What is one of the three key ideas of the Turing machine?
a. infinite memory by virtualization
b. a memory bank for storing data
c. using divide and conquer to always reduce an algorithm to O(n) runtime
d. using divide and conquer to always reduce an algorithm to O(1) runtime

18. What is P versus NP?


a. P refers to the polynomial time complexity class, whereas NP refers to the nondeterministic
polynomial time complexity class.
b. P refers to any Big O notation past O(N3), whereas NP refers to any Big O notation less than O(N3).
c. NP is a constant runtime, whereas P is polynomial runtime.
d. P refers to constant runtime, whereas NP is linear runtime.

Conceptual Questions
1. Explain the difference between algorithms and programs.

2. Explain the difference between data structures and abstract data types.

3. Explain the relationship between data representation, data structures, and algorithms.

4. What is the relationship between search algorithms and the searching problem?

5. What is the relationship between search algorithms and the autocomplete feature?

6. Why is algorithmic correctness difficult to determine?

7. What are some limitations of experimental analysis?

8. What are some benefits of experimental analysis over asymptotic analysis?

9. Why is a 1-element list the best-case situation for sequential search?

10. If phone numbers are ten digits long and can contain digits from zero through nine, what is the total
number of potential phone numbers?
140 3 • Chapter Review

11. Why might we prefer a sub-optimal greedy algorithm over a correct brute-force algorithm?

12. What’s problematic about the statement, "municipal broadband planning reduces to Kruskal’s algorithm"?

13. Describe the relationship between the pivot element and the left and right sublists after the first partition
in quicksort.

14. The runtime of Kruskal’s algorithm is in O(|E| log |E|) with respect to |E|, the number of edges. What
primarily contributes to this linearithmic order of growth?

15. Both Prim’s algorithm and Dijkstra’s algorithm are greedy algorithms that organize vertices in a priority
queue data structure. What is the difference between the ordering of vertices in the priority queue for
Prim’s algorithm and Dijkstra’s algorithm?

16. What is the relationship between models of computation and algorithms?

17. What is the common difficulty preventing us from designing an efficient algorithm for solving NP-
complete problems?

18. What are the consequences of P = NP?

Practice Exercises
1. Binary search trees organize elements in ascending sorted order within the tree. However, binary search
trees can become unbalanced. In the worst-case, a binary search tree can look exactly like a linked list.
Give an order for inserting the following numbers into a binary search tree such that the resulting tree
appears like a linked list: 7, 3, 8, 1, 2, 5, 6, 4.

2. Consider these two different approaches for implementing the priority queue abstract data type using a
linked list data structure: (1) organize the elements by decreasing priority value, and (2) organize the
elements arbitrarily. Describe algorithms for inserting an element as well as retrieving and removing the
highest-priority element from these two data structures.

3. In our definition of a priority queue, we emphasized retrieval and removal of the highest-priority
elements—a maximum priority queue. What if we wanted to instead prioritize retrieval and removal of the
lowest-priority elements—a minimum priority queue? Describe a simple change that we could make to
make any maximum priority queue function as a minimum priority queue.

4. Formally describe the problem model for a drug administration medical system in terms of input data and
output data represented as lists, sets, maps, priority queues, and graphs.

5. Formally describe the problem model for a music recommendation system in terms of input data and
output data represented as lists, sets, maps, priority queues, and graphs.

6. There can sometimes be thousands, if not millions, of results that match a Web search query. To make this
information more helpful to humans, we might want to order the results according to a relevance score
such that more-relevant results appear before less-relevant results. Describe how we can solve this
problem of retrieving the N-largest elements using the following algorithm design patterns: (1) a sorting
algorithm, and (2) a priority queue abstract data type.

7. What is the best-case and worst-case Big O order of growth of sequential search with respect to N, the size
of the list?

8. What is the best-case and worst-case Big O order of growth of binary search with respect to N, the size of
the sorted list?

9. What is the worst-case Big O order of growth of sequential search followed by binary search with respect
to N, the size of the sorted list?

Access for free at openstax.org


3 • Chapter Review 141

10. What two sublists are combined in the final step of merge sort on the list [9, 8, 2, 5, 4, 1, 3, 6]?

11. If a connected graph has unique edge weights, will Kruskal’s algorithm find the same minimum spanning
tree as Prim’s algorithm? How about a connected graph with duplicate edge weights?

12. What is a reduction algorithm for the problem of finding the median element in a list?

13. The heapsort algorithm applies a binary heap priority queue ordered by the comparison operation to sort
elements. What can we say about the first element removed from the binary heap if it implements a
minimum priority queue? What about the last element removed? Is the binary heap data structure sorted?

14. Hashing algorithms can provide a constant-time solution to the search problem under certain conditions.
What are the conditions necessary to ensure the runtime of a hashing search algorithm is constant?

15. Why is it the case that depth-first search cannot be directly applied to compute an unweighted shortest
paths tree?

Problem Set A
1. Linked lists and binary search trees are two examples of linked data structures, in which each node in the
data structure is connected to other nodes. Given a linked list of the numbers one through seven,
organized in ascending sorted order, draw two different examples of binary search trees representing the
same numbers.

(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

2. Given the following binary search tree, draw the corresponding ascending-sorted linked list containing the
same numbers.

(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

3. We defined autocomplete as a problem that takes as an input a string of letters that might represent the
start of a word, and outputs a list of words that match the input. Describe two other problem models for
autocomplete and define the ramifications of the models.

4. Compare and contrast the autocomplete problem models. What are the trade-offs of each problem
model?

5. Consider the problem of arranging desktop icons in a grid layout so that they fill column-by-column,
starting from the left side of the screen. Suppose we are given a list of desktop icons in alphabetical order
and that placing an icon in a grid position is a constant-time operation. What is the Big O order of growth
of an algorithm that takes each icon in the given order and places each icon in the next open grid position
with respect to N, the number of desktop icons?

6. Suppose we’re given a list of desktop icons in alphabetical order, but that placing an icon in a grid position
is a linear-time operation with respect to the number of icons. (Perhaps a sequential search is needed to
142 3 • Chapter Review

check that the icon has not already been placed on the desktop.) What is the Big O order of growth of this
icon arrangement algorithm with respect to N, the number of desktop icons?

7. Why does the greedy interval scheduling, which selects the cheapest, least time-consuming task, fail to
maximize the number of completed tasks?

8. What is a simple rule for greedy interval scheduling that will maximize the number of completed tasks?

9. The runtime of most binary heap priority queue operations is in O(log N) with respect to N, the size of the
priority queue. The logarithmic factor is due to the height of the binary heap data structure. But finding an
element in a binary heap typically requires sequential search. How can we apply the hashing algorithm
design pattern to find an element in a heap in constant time?

10. The runtime of breadth-first search is in O(|V| + |E|) because each reachable vertex and edge is
processed one-by-one during the level-order graph traversal. Why is the runtime of Prim’s algorithm in
O(|E| log |V| + |V| log |V|)? Explain in terms of the time it takes to process each vertex or edge in the
graph.

Problem Set B
1. Each unique key in a map is paired with one (possibly not-unique) value. Sometimes, we want to associate
more than one value with a given key. Describe how we can use a list or set abstract data type to associate
more than one value with a unique key.

2. Each vertex in a graph can be labeled with a unique identifier, such as a unique number. Describe the
relationship between adjacent vertices. How could we use abstract data types such as lists, sets, and maps
to represent these relationships?

3. Describe how we might implement the graph abstract data type using other abstract data types such as
lists, sets, and/or maps. Explain for graphs whose edges have associated weights as well as graphs whose
edges do not have associated weights.

4. Describe two or three different problem models for a medical system designed to help doctors
recommend preventive care for patients.

5. Compare and contrast the medical system problem models. What are the trade-offs of each problem
model?

6. How does the choice of problem model affect potential algorithms? Describe an algorithm for each
problem model.

7. Consider an autocomplete implementation that relies on a sequential search to find all matching terms.
What is the worst-case Big O order of growth for computing a single autocomplete query with respect to
N, the number of potential terms?

8. Consider an autocomplete implementation that sorts the list of potential terms and then performs binary
search to find the matching terms. If the order of growth of the sorting algorithm is in O(N log N), what is
the worst-case Big O order of growth for computing a single autocomplete query with respect to N, the
number of potential terms?

9. Why might algorithm designers prefer to use binary search instead of sequential search for autocomplete?

10. In a 1-D space where each point is defined with x coordinates, a common problem is to find the closest
pair of points in the space: the pair of points that has the least distance among all potential pairs of
points. What is a brute-force algorithm for solving this 1-D closest pair problem? What is the Big O
notation order of growth of this algorithm?

11. In a 2-D space, each point is defined with (x, y) coordinates. What is a brute-force algorithm for solving the

Access for free at openstax.org


3 • Chapter Review 143

2-D closest pair problem?

12. What are the recursive subproblems in a divide and conquer algorithm for solving for the closest pair
problem?

13. Digital images are represented in computers as a 2-D grid of colored pixels. In image editing, the flood fill
problem takes a given starting pixel and replaces all the pixels in a contiguous region that share the same
color with a different color. How can we represent the colored pixels in a digital image as a graph with
vertices and edges for the flood fill problem?

14. How should we modify a graph traversal algorithm to solve the flood fill problem using your graph
representation?

15. What is a reduction algorithm for reducing from the flood fill problem to the graph traversal problem? In
this case, the graph traversal algorithm cannot be modified. Instead, define a preprocessing step to create
a graph representation that encodes the flood fill same-color rule.

Thought Provokers
1. Maps can be defined in terms of sets: every map is a set whose elements represent key-value pairs, where
the key must be unique, but the value might not be unique. Consider other relationships between abstract
data types. Can sets and maps be defined in terms of graphs? Can lists be defined in terms of maps? Can
priority queues be defined in terms of maps? Why might it be useful to define abstract data types in terms
of other data types?

2. Graph theory refers to the mathematical study of graphs. How might a graph theorist describe linked lists
and tree data structures? How does this differ from our use of abstract data types?

3. Sorting and searching are two examples of data structure problems related to the storage and retrieval of
elements. Where do sorting and searching appear in linear data structures, tree data structures, and/or
graph data structures?

4. What are some benefits and drawbacks of simpler problem models, as they compare to more complicated
problem models?

5. The formal definition of Big O notation does not exactly match our working definition for orders of growth.
Do some additional research to explain why binary search is also in O(N).

6. Since binary search is in O(N), it is also true that binary search is in O(N2). Explain why computer scientists
might find O(N2) to be a less useful description of the runtime of binary search compared to O(log N).

7. We can show that the worst-case order of growth for any comparison sorting algorithm must be at least
linearithmic using an argument from combinatorial explosion in the number of unique permutations of
elements in a list. What are the number of unique permutations of a list with N elements? How many
comparison questions need to be asked to identify a particular permutation from among all the
permutations? How do these questions relate to comparison sorting?

8. Breadth-first search is a fundamental algorithm design pattern for graph problems. How is breadth-first
search applied as a foundation for designing greedy algorithms such as Prim’s algorithm and Dijkstra’s
algorithm? How does Kruskal’s algorithm fit into these algorithm design patterns and paradigms? If Prim’s
algorithm is analogous to sorting in Kruskal’s algorithm, why is there no analogue to the Dijkstra’s
algorithm in sorting as well?

9. Suppose we want to find the longest path from a starting vertex to an ending vertex in a graph. How
might a nondeterministic algorithm solve this problem in polynomial time?

10. Suppose we want to find the longest path from a starting vertex to an ending vertex in a graph (solving
the function problem) without using a nondeterministic algorithm. Let’s say that P = NP and we have a
144 3 • Chapter Review

deterministic polynomial-time algorithm that returns whether there is a path with exactly cost k (solving
the decision problem). We also know the cost of the actual longest path. How can we repeatedly apply this
decision algorithm to design a polynomial-time longest paths function algorithm?

Labs
1. Simulate patients entering and exiting a hospital emergency room with a priority queue using patients’
time of arrival, basic assessment of severity, and availability of doctors specializing in the appropriate type
of care. Decide how to prioritize patients based on a property of each patient, such as their arrival time,
numeric severity rating, numeric urgency rating, and availability of care providers. Then, consider how
your decision might result in unfair allocation of medical care to patients.

2. Choose a lab from the Ethical Reflections Modules (https://openstax.org/r/76Ethics) for CS1. Follow the
instructions to complete the assignment. Once you have finished, answer this additional reflection
question connecting back to algorithm design: How did you utilize algorithm design patterns? How did
your choice of algorithm designs affect the end outcomes in your program?

3. Experimental analysis: Use a software-based “stopwatch” to compare the time (in microseconds) it takes to
run a sequential search versus a binary search for successively larger and larger inputs. Relate
experimental analysis to asymptotic analysis. What happens to small arrays? What happens to large-size
array inputs? What happens when the target is near the front of the array? What happens when the target
is near the end of the array?

Access for free at openstax.org


4
Linguistic Realization of Algorithms: Low-Level Programming
Languages

Figure 4.1 Low-level programming languages support little or no abstraction; they allow programmers to write software in
languages that are closer to English and are suitable for system software that powers mobile devices with limited energy and
computing resources such as prosthetics. (credit: modification of “Tilly Lockey at the SingularityU The Netherlands Summit 2016” by
Sebastiaan ter Burg/Flicker, CC BY 2.0)

Chapter Outline
4.1 Models of Computation
4.2 Building C Programs
4.3 Parallel Programming Models
4.4 Applications of Programming Models

Introduction
The machines we call “computers,” including modern desktop computers, laptops, and web servers, are
remarkably fast and capacious. However, computer hardware is also embedded in devices that do not fit the
traditional definition of computers: home appliances, automobiles, smart thermostats, tools, and televisions.
Along with being energy-efficient and affordable, these devices may also need to be lightweight, portable, or
even wearable. For these reasons, embedded systems have meager processing speed and memory capacity.
This chapter focuses on low-level programming languages, which are used in practice to create software for
resource-constrained devices. Efficiency is critical to making these devices useful and economically viable. The
efficiency of embedded software is make-or-break; in other words, if we can write efficient code that runs fast
and uses little memory, we enable technologies that can help people with their daily lives. Therefore, computer
scientists place considerable emphasis on the efficiency and speed of low-level languages, which is why low-
level languages are important to society. For efficiency reasons, the syntax of low-level programming
languages relies on instructions that are computer-centric and challenging for humans to work with. This has
led computer scientists to create “middle-level” languages that emphasize human-readability without
compromising efficiency.

Consider our fictional company, TechWorks, which is bringing a line of next-generation prosthetics to the
market. While legacy prosthetics are sometimes awkward to use and limited in function, “smart” prosthetics
can be revolutionary. These prosthetics are computer-controlled, Internet-connected, and make use of artificial
146 4 • Linguistic Realization of Algorithms: Low-Level Programming Languages

intelligence, allow those who need prosthetics to enjoy a quality of life that meets or exceeds expectations.

Computer control refers to the ability to manage, organize, or run something on a computer, whereas
intelligent control is a class of control techniques that use various artificial intelligence computing approaches.
For example, artificial intelligence algorithms can accurately determine the intentions of the wearer and
control a prosthetic’s motion in an accurate and natural way. Internet connectivity means devices can be
conveniently controlled by applications, relay telemetry to healthcare providers, and automatically apply over-
the-air updates. TechWorks has fitted a small, inexpensive, energy-efficient system-on-chip computer to their
devices, and are using the middle-level language C to implement these features efficiently.

4.1 Models of Computation


Learning Objectives
By the end of this section, you will be able to:
• Define low-level programming languages, including assembly language
• Define middle-level and high-level programming languages, such as C and JavaScript
• Compare and contrast the various programming paradigms

Algorithms are used to solve computational problems and create computational models. A computational
model is a system that defines what an algorithm does and how to run it. Examples of such computational
models include physical devices that can run software, programming languages, or a design specification of
such. A programming language is a linguistic application of an algorithm, which uses computational models
focused on defining notations for writing code.

Many computational models have been devised for a host of other applications. There are many different roles
and perspectives within the worlds of computer science and software development. The end goal of software
development is to create working software that can run on a hardware model, which itself uses a (hardware)
realization of an algorithm that enables specific physical computers to execute software programs. A hardware
model is designed for the convenience of a machine, not a human software author, so hardware models are
poorly suited to writing code. Computer scientists have created programming languages which are designed
specifically for programmers to develop practical applications. These languages are usually classified into high-
level (Java, Python) and low-level languages (assembly language). A high-level programming language
operates at a high level of abstraction, meaning that low-level details such as the management of memory are
automated. In contrast, a low-level programming language operates at a low level of abstraction. Languages
like C and C++ can perform both high-level and low-level tasks.

Most software is designed, written, and discussed in terms of how a program should work. It is basically a
series of steps that provide a direction of how the program must be executed. An example of this would be the
“Map Reduce model” which is used in distributed systems like the Google search engine to produce search
results for large data sets using a complex algorithm. Moving even further away from hardware models,
computer scientists have also defined an abstract model, which is a technique that derives simpler high-level
conceptual models for a computer while exploring the science of what new algorithms can or cannot do.

Modern computers are equipped with a central processor, referred to as a central processing unit (CPU),
which is a computer chip capable of executing programs. A CPU’s hardware model relies on a specific CPU
instruction set architecture (ISA) that defines a list of operations that the CPU can execute, such as storing
the results of calculations (Figure 4.2). With the advancements of technology, computer engineers have
designed computer architectures with increasing sophistication and power. Examples of hardware models
include the MOS Technology 6502 architecture used by the Nintendo Entertainment System, the ARM
architecture used by mobile phones, and the x86-64 and AMD64 architectures used by modern personal
computers. Computer engineers design architectures with hardware specifications, such as execution speed or
energy use, in mind. Therefore, hardware models are not suitable for humans to use for communicating

Access for free at openstax.org


4.1 • Models of Computation 147

algorithms.

Figure 4.2 A standard CPU model shows how a program logic applies low-level instructions to an input to get an output; the program
leverages registers and memory (black arrows) and the CPU orchestrates the overall execution of the program (red arrows).
(attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

A programming model is designed for humans to read and write. A programming model focused on defining
a notation for writing code can also be called a programming language. A programming model can be used to
implement a software algorithm using a strict set of syntactical rules. A language’s syntax can define keywords
such as “if.” The syntax may include a mathematical operator, a fundamental programming operation that
combines values, such as “+.” The syntax can define punctuation such as “;”. Essentially, the syntax gives the
precise meaning for what each of these elements directs a computer to do. The text of a program written in a
programming language is called source code. Software engineers have created practically all software by
writing source code in various programming languages. Since a programming model cannot directly execute a
program, a compiler or interpreter must translate source code from a middle-level or high-level language
into something a computer can execute.

As mentioned, abstract models are computational models used to think about algorithms in the abstract,
rather than being used to create and run software. The goal of an abstract model is to make it easy for people
to devise and convey algorithms. Computer scientists use abstract models to create new algorithms, analyze
the efficiency of algorithms, and prove facts about what algorithms can and cannot do. An abstract model is
not concerned with the details of computer architectures, which makes it easier to focus on these sorts of
deep questions. Examples of abstract models include the Random Access Machine, the Turing machine, and
the Lambda calculus. The Random Access Machine (Figure 4.3) is a CPU that consists of unlimited memory
cells that can store any arbitrary value. Just like any other CPU, the PC determines the statement to be
executed next. A Random Access Machine can be used to analyze the efficiency of algorithms. A Turing
machine (Figure 4.4) is a mathematical model that can implement any algorithm. The Lambda calculus is a
theoretical computation concept using lambda functions. It was defined by Alonzo Church and inspired the
functional programming paradigm, which you will learn more about in Programming Language Paradigms.
148 4 • Linguistic Realization of Algorithms: Low-Level Programming Languages

Figure 4.3 A Random Access Machine has unlimited memory cells that can store any arbitrary value. (attribution: Copyright Rice
University, OpenStax, under CC BY 4.0 license)

Figure 4.4 A neural Turing machine (NTM) leverages the pattern matching capabilities of neural networks in addition to more
traditional computational models. It use a controller that interacts with external memory resources through attention mechanisms
that mimic human attention to improve performance. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0 license)

A function defines how to convert an input into an output, and functional programming is a paradigm in
which algorithms are written as mathematical functions. An example of a functional programming paradigm is
a recursion (refer to the following code snippet), where a function is used to call itself. The factorial of an
integer can be computed using a recursive function.

import java.util.*;
public class Recursion {
public static int Factorial_Recursion(int Val){
if(Val==0){
return 1;
}
else
return Val*Factorial_Recursion(Val-1);
}

Access for free at openstax.org


4.1 • Models of Computation 149

public static void main(String[] args){


Scanner s =new Scanner(System.in);
System.out.println("Enter an input value: ");
int Val=s.nextInt();
System.out.println("The factorial of " + Val + " is: " +
Factorial_Recursion(Val));
}
}

An algorithm described in an abstract model cannot be run directly. First, a software developer must
implement the algorithm, which means translating the abstract algorithm into source code in a programming
language.

It may be surprising that so many different programming models can exist, and that algorithms can be
translated from one model to another. However, this translation is by design; computer science established the
Church-Turing Thesis, which is a scientific theory stating that an algorithm can be converted from any
reasonable computational model to another. The Church-Turing Thesis provides a lens through which
computer scientists can invent many computer architectures and programming languages, all of which can
run algorithms.

Computer scientists have created terminology to make sense of the similarities and differences among all
these programming languages. For example, any programming language can be low level or high level, or it
can fall anywhere on the spectrum.

Low-Level Programming
A programming language’s level of abstraction is the degree to which a computational model, programming
language, or piece of software relates to computer hardware. A low-level language has a low level of
abstraction, while a high-level language has a high level of abstraction. In a low-level language, the
programmer must describe an algorithm in terms that the computer hardware can understand directly.
Source code must describe details such as the location of data in memory and the particulars of how the
computer calculates arithmetic.

Generally, low-level programs execute faster but are more labor-intensive to create and maintain. In a low-level
language, the programmer is forced to think deliberately about how the computer hardware executes, so the
finished program usually executes efficiently. However, that deliberate thought takes time and effort. In a high-
level language, the programmer is not burdened with thinking about so many details and can finish their work
faster while preventing certain types of programming errors from occurring. A compiler automates converting
high-level code to low-level code, but that automated process can introduce some inefficiency. In some
settings code performance is more important, and in other settings programmer productivity is more
important, which is why we have both kinds of languages.

We can think of low-level programming languages in terms of cooking: when you cook a meal from scratch,
you control every ingredient and every detail of preparation, so the finished meal has precisely the taste and
nutrition that you desire. An alternative is to prepare a meal that uses some prepared ingredients, and when
you do that, you lose a lot of control over details, but the process is significantly faster and easier.

There are many examples of low-level programming languages, but the most fundamental language
understood by computers is made up of a sequence of digits.

Machine Code
The sequence of binary digits (bits) that can be understood and executed directly by a computer is called
machine code (Figure 4.5). Machine code is the most low-level language. It is also known as binary code. It is
a program in the native format that can be understood by a CPU, in the form of a long series of 0s and 1s.
150 4 • Linguistic Realization of Algorithms: Low-Level Programming Languages

Machine code, or binary code, is the only computational model that a computer can execute; a program
written in any other language must be compiled or interpreted into machine code before the program can
run. The CPU of a computer is a computer chip capable of executing machine code programs (Figure 4.6). It is
impractical for humans to work with machine code directly because a machine code program is not designed
to be human-readable. The patterns of 0s and 1s are designed to be convenient for a CPU to decode, not for
humans to manipulate; and such programs are long, typically millions or billions of bits long. Another obstacle
is that machine code is hardware dependent. As discussed in 5.3 Machine-Level Information Representation,
every processor architecture has its own machine language, so machine language written for one architecture
(for example, INTEL X86) cannot work on any other architecture (such as ARM). When the very first digital
computers were built, and programming languages had yet to be invented, programmers had no choice but to
write machine code by hand. However, this is extremely time-consuming and prone to errors, so is almost
never done today.

Figure 4.5 Machine code, with its 0s and 1s, is the only computational model a computer can execute. (attribution: Copyright Rice
University, OpenStax, under CC BY 4.0 license)

Figure 4.6 A computer chip is a computer’s central processing unit (CPU). (credit: modification of "IIci" by raneko/Flickr, CC BY 2.0)

Assembly Language
The low-level language in which every statement corresponds directly to a machine instruction is called
assembly language. Assembly language is a small step above machine code but is still a very low-level
language. Assembly language is a textual representation of machine code. Just like a machine code program,
an assembly language program is a series of instructions that a CPU will execute. However, rather than writing
the instructions in a binary format of 0s and 1s, each instruction has a textual name such as “ADD” or “MOVE.”
An assembler is a program that translates assembly language source code into machine code. As shown in
Figure 4.7, an assembler translates each textual instruction into the corresponding list of 0 and 1 bits.

Access for free at openstax.org


4.1 • Models of Computation 151

Figure 4.7 An assembler is a program that translates assembly language source code into machine object code. (attribution:
Copyright Rice University, OpenStax, under CC BY 4.0 license)

While it is practically impossible for a human to write a complete program in machine code, writing programs
in assembly language is viable. Because assembly language is extremely low-level, these programs tend to run
quickly, but they are labor-intensive to write, and are machine-dependent. This type of programming was
sufficient in the 1960s, 1970s, and 1980s when software was written for one-off capital-intensive machines,
such as multimillion-dollar mainframes or space vehicles. Programmer labor was comparatively cheap then,
and there was no need to move programs to different hardware. But today, we expect applications to be
compatible with multiple kinds of platforms, including phone, computer, and gaming systems. Programmer
labor is more expensive than computer hardware, so writing entire programs in assembly language is
uneconomical. Consequently, programs are often written predominantly in a high-level language, with
assembly language used to write short excerpts on an as-needed basis. Writing code in a higher level
language makes it easier to write correct code that does not have defects.

LINK TO LEARNING

Assembly language has been used in high-profile, high-budget projects, such as Apollo 11, the NASA
spaceflight that first landed humans on the moon. You can examine the assembly code for the embedded
computers (https://openstax.org/r/76AssemblyCode) in the space vehicles, which has been released
publicly. Notice how it is quite low-level, perhaps difficult to follow, and reflects an immense amount of
fastidious work.

Middle-Level and High-Level Programming


As the name implies, a middle-level programming language is at a level of abstraction in between low-level
and high-level language, and allows for direct hardware access. The C programming language is a middle-level
language that has been in wide use since the 1970s. The C++ programming language is a middle-level object-
oriented language based on C. In general, the trade-off between low-level and high-level languages is that
writing low-level code is laborious and error-prone, but the finished code executes very quickly; high-level code
is faster, easier, and safer to write but does not run quite as quickly. Middle-level code is a compromise that
executes nearly as fast as low-level code yet has some of the productivity benefits of high-level code. Like low-
level languages, middle-level languages allow direct access to computer hardware, making it possible to write
hardware-specific programs such as operating systems and device drivers. An operating system is the
software that provides a platform for applications and manages hardware components. A device driver is a
piece of code responsible for connecting to a hardware component, such as a video card or keyboard. Figure
4.8 summarizes the trade-offs between low-level, middle-level, and high-level programming languages.
152 4 • Linguistic Realization of Algorithms: Low-Level Programming Languages

Figure 4.8 As a rule, high-level languages are less laborious to write, and slower to execute, than low-level languages. High-level
languages typically do not support direct hardware control. (attribution: Copyright Rice University, OpenStax, under CC BY 4.0
license)

Middle-level languages are ideally suited to writing systems software, programs that provide infrastructure
and platforms that other programs rely upon. The core part of an operating system that is responsible for
managing and interfacing with hardware components is called the kernel. Kernels need direct hardware
access, so high-level languages are inadequate. Practically all widespread kernels are written in C and/or C++
(such as Windows, MacOS, Linux, iOS, Android, Xbox, and PlayStation). Compilers for high-level languages,
such as Python, Java, and C#, are themselves implemented in middle-level languages such as C.

Figure 4.9 summarizes the various types of programming languages and how middle-level languages overlap
with low-level and high-level programming languages.

Figure 4.9 Computation models fit onto a spectrum from low- to high-level. (attribution: Copyright Rice University, OpenStax, under
CC BY 4.0 license)

A high-level language is farther from a hardware model, and closer to an abstract model. Source code in a

Access for free at openstax.org


4.1 • Models of Computation 153

high-level language does not address low-level details, and instead focuses on how an algorithm proceeds,
such as “visit every item in a list.” A high-level language is like viewing Earth at a high altitude, revealing large
features such as the contours of rivers and highways, whereas a low-level language is like viewing Earth at
ground level, which allows for focusing on fine details such as the activity of individual people and animals.

Web application frameworks (e.g., React, Node) are written in high-level languages, principally JavaScript. A
web framework is a special tool for building and managing web applications. Some common ones used in web
clients are HTML5, CSS, and JavaScript. Native Android apps are primarily written in the high-level language
Java, and iOS apps are primarily written in the high-level language Swift.

Programming Language Paradigms


So far, we have categorized programming languages according to their level of abstraction into low-level,
middle-level, and high-level languages. A different approach is to categorize languages into paradigms. A
programming language paradigm is a philosophy and approach for organizing code, the ideas in a program,
and the layout of its source code. Real-world programs involve many thousands of lines of source code, which
is too much for a human to digest without some kind of organizational structure. Computer scientists have
developed several different paradigms for creating this structure.

Unlike level of abstraction, paradigms do not fall on a spectrum. Instead, a particular programming language
either adheres to the philosophy of a paradigm, or it does not. For example, C is a structured procedural
language and not an object-oriented language. Without getting into too many details, C is procedural because
it allows programmers to place