Logic
Logic
(version 0.11.1)
Hillel Wayne
i
Contents
1 Intro 1
1.1 Beta Notes 1
1.2 New in v0.11: 1
1.3 Why this book 2
1.4 Design Philosophy 3
1.5 How to Read This Book 3
3 Refactoring Code 22
3.1 Simplifying Conditionals 22
3.2 Refactoring with Quantifiers 24
3.3 Programs are not Math 29
3.4 Using sets 30
3.5 Summary 31
5 Functional Correctness 41
5.1 Assertions 41
5.2 Contracts 42
5.3 Contracts vs Types 46
5.4 Polymorphism and Refactoring 48
5.5 Summary 51
7 Case Analysis 63
7.1 Decision Tables 63
ii
7.2 Another Requirements Example 66
7.3 Analyzing Code 67
7.4 Techniques 68
7.5 When is a Table the Wrong Choice? 71
7.6 Summary 72
8 Databases 73
8.1 A Relational Model Overview 73
8.2 Querying Data 74
8.3 Database Constraints 78
8.4 Constraints Are Queries 80
8.5 Summary 83
9 Data Modeling 85
9.1 Abstracting from Data 85
9.2 In Practice: Formal Specification 86
9.3 Finding Bugs with Specifications 91
9.4 Summary 93
10 System Modeling 95
10.1 Situation 95
10.2 The Logic 95
10.3 In Practice: TLA+ 100
10.4 Specification in the wild 105
10.5 Summary 106
11 Solvers 108
11.1 Logic 108
11.2 In Practice: Solvers 110
11.3 Which to use? 115
11.4 Summary 116
iii
B.1 Table of Tautologies 127
Index 143
iv
Chapter 1
Intro
1.1 Beta Notes
I’m doing early access with this book, so this is all beta. Most of the material is now
in, but I still need to polish and revise it, add more exercises, improve formatting,
and incorporate reader feedback.
I welcome any and all comments. I’m particularly interested in:
1. Do the examples seem useful to you? Were the exercises helpful?
2. Which topics need the most focus?
3. What resources would be good to recommend as “further reading”?
4. What examples and new topics would you like to see?
5. What needs more exercises?
You can email me at [email protected]. Thank you very much!
Note
Anything in a note box is a message from me to you as early readers. Things
I’m uncertain about, things I plan to polish more, things that I plan to write, etc.
[[double braces]] are similar. Feel free to throw comments my way!
• Brand new chapter, “Proving Code Correct”, covering proofs, loop invariants,
formal verification
• Total rewrite of “Database” chapter:
– Now covers database representations, relational model, queries, joins,
and constraints
– Two new executable SQL examples on constraints
– One new image
• Total rewrite of “Functional Correctness”:
– Now covers assertions, MISU, polymorphism, advice
1
2 CHAPTER 1. INTRO
This book is meant specifically for programmers with little to no familiary with for-
mal math. In all cases, I opted for accessibility and ease-of-use over precision or
rigor. This is a technical how-to, not a textbook.
1.4.1 Notation
Mathematics shares many operations in common with programming but uses dif-
ferent representations, such as writing “and” as ∧ instead of &&. I will using pro-
gramming terminology wherever possible. I included an appendix (page 125) which
maps conventional programming symbols to math symbols.
In cases where math symbols don’t have common programming analogs (such as
∀), I have opted to use an explicit English equivalent (such as all).
Lastly comes the question of array indexing. Does the array arr start at arr[1] or
arr[0]? There is no universal programming convention, as different languages make
different choices. I would use the mathematician’s convention except that does not
exist either: different branches of mathematics make different choices too! So I
will default to 0-based indexing unless demonstrating a tool or language which uses
1-based indexing, which I will explicitly note.
I recommend first reading A Crash Course in Logic (page 5), and then moving to
whichever technique looks most interesting. Techniques chapters are independent
except when otherwise noted, in which case backreferences are provided.
The first five techniques (starting with Refactoring Code (page 22)) focus on how logic
applies to everyday software. The last four (starting with Data Modeling (page 85))
4 CHAPTER 1. INTRO
cover special logic-based tools that unlock powerful new solutions to difficult soft-
ware problems.
Large code samples are available online at https://github.com/
logicforprogrammers/book-assets.
1.5.1 Exercises
Exercises are provided to help you check your knowledge and develop your skills
further. All exercises have solutions in the back of the book. Some of the exercises
have multiple possible solutions. Your answer can be correct even if it differs from
the “official” solution!
Some questions involve writing short snippets of code. In these cases, use whatever
language you like. I will personally give examples in Python or pseudocode. When
writing Python, I have tried to make it as accessible as possible, meaning it does not
do things in an idiomatic way.
Chapter 2
A Crash Course in Logic
Formal logic is a very powerful tool, but it’s also very simple. Over this chapter, we’ll
motivate and explain all of the basic concepts and syntax. Much of it may already
be familiar to you from programming experience!
2.1 Predicates
# concrete
Positive(x) = x > 0
IsSum(x, y, z) = x + y == z
(continues on next page)
5
6 CHAPTER 2. A CRASH COURSE IN LOGIC
# abstract
CanRunProgram(c) = `c can run our program`
This is not a common mathematician convention, but it’s clear enough to program-
mers. To distinguish predicates from “ordinary functions” like add_two, predicates
will always be TitleCased and functions will always be snake_cased.
Predicates act as a bridge between how we talk about systems in a human language
and how we encode them in a programming language. Let’s come back to CanRun-
Program. I included that example because I once saw a program with these require-
ments:
The computer must have enough RAM and a fast CPU or a good graphics
card (GPU).
I found this confusing. The sentence sounds natural enough in English, but we can
find a problem by formalizing with logic. We will start by first writing predicates for
each subrequirement, like so:
These predicates are abstract because we don’t know the specifics of what these
mean. Is 64gb “enough RAM”? Is 32gb? The specifics don’t matter for us, because
this is already enough to write CanRunProgram as a concrete mathematical expres-
sion.
(Here we’re using && for AND and || for OR. This is just the convention for this book:
you may see other resources use “and” and “or” or something else. Mathematicians
use ∧ and ∨. I’m not going to use these because they’re not found on the keyboard.
We’ll also use ! for “not”; mathematicians use ¬.)
Now the problem is clearer: is a && b || c supposed to be read as (a && b) || c or as a
&& (b || c)? The predicate is malformed and we have two different ways of making it
make sense:
2.1. PREDICATES 7
# way 1
CanRunProgram(c) = RAM(c) && (CPU(c) || GPU(c))
# way 2
CanRunProgram(c) = (RAM(c) && CPU(c)) || GPU(c)
Both interpretations make sense in English! But they have different outputs for
some inputs. We can see this by listing every single possible combination of val-
ues for RAM/CPU/GPU, and see what they give for CanRunProgram. This is called a
truth table.
There are two combinations of inputs where one interpretation is false and the other
is true. It’s possible that the vendor meant the first interpretation when writing the
requirements, but I read it as the second interpretation. Then I am sure that the
program will run on my computer, the vendor never expects it to, and I get mad that
they “lied” to me. Much better to express the requirement mathematically!
Expressing properties with formal logic is less ambiguous than with informal En-
glish. For the purpose of teaching, we will assume the intended predicate is (RAM(c)
&& CPU(c)) || GPU(c)).
Tip
If you ever have trouble generating a truth table, you can try to use a truth table
generator, like here1 . Try p || !q and experiment from there.
1 https://web.stanford.edu/class/cs103/tools/truth-table-tool/
8 CHAPTER 2. A CRASH COURSE IN LOGIC
Let’s now make a variation on our predicate. Some programs have a native version
and a web version. The native version uses the local computer’s resources, the web
version does most of the processing on some cloud computer somewhere. So the
native version requires a beefy computer, but any computer can run the web client.
If a computer is running the native version, it must have enough RAM
and a fast CPU or a good graphics card (GPU) to use this program. But if
it’s not running the native version, you’re fine.
To model this, we’ll need a new predicate, Native(p). Native is a property of the pro-
gram, not the computer. CanRunProgram then depends on both the program and the
computer.
I used backticks here because half the predicate is still in informal English. It turns
out, though, that we already have the tools we need to express this. We want that if
Native(p) is false, CanRunProgram(c, p) is automatically true: we don’t need to even
look at the computer specs.
How does this work? It’s easier to see if we pull out the right hand side into a new
predicate, like Beefy(c), so we have !Native(p) || Beefy(c). Here’s the truth table for
that expression (using N(p) for Native(p) and B(c) for Beefy(c)):
named “implies” (or implication). P => Q (“P implies Q”) is the same as writing !P ||
Q. Expressed this way, our predicate is
=> binds less tightly than && and ||: A && B => C is (A && B) => C, not A && (B => C).
Exercise 1 (Implication)
Say we had two more “conditions”, so that CanRunProgram was instead
CanRunProgram(c, p) =
`true unless Native(p) and either Q(p) or R(p),
in which case (RAM(c) && CPU(c)) || GPU(c))`
Write this without using =>. Then write this with =>. Which is easier to read?
Solution (page 132)
Exercise 2
Right now RAM(c) means that “computer c has sufficient RAM”. Modify it to
mean “computer c has enough ram to run program p”. Make similar changes
for our other predicates and write CanRunProgram.
Solution (page 132)
Exercise 3
1. Using =>, write the expression “if Native(p) is true then Web(p) is false,
and if Web(p) is true then Native(p) is false.”
2. Using &&, write the expression “Native(p) and Web(p) are not both true.”
3. Using ||, write the expression “Native(p) is false or Web(p) is false.”
Solution (page 132)
10 CHAPTER 2. A CRASH COURSE IN LOGIC
2.2 Sets
CanRunProgram(c) = `c is a computer`
&& ((RAM(c) && CPU(c)) || GPU(c))
This will make writing predicates with several constrained parameters easier.
2.2. SETS 11
Note
The set of all elements our predicates are acting on is called the domain of dis-
course. So as to prevent eldritch math horrors, predicates cannot be in the do-
main of discourse: there are no predicates that take other predicates. Other-
wise you can do what you want. Usually the domain of discourse is contextually
evident, and we don’t need to write it. If you want to know more about eldritch
math horrors, check out Beyond Logic (page 129).
Notice that if we define EnoughRAM as the set of all computers with enough RAM,
then every element of that set is also in the set Computer. We say that EnoughRAM is
a subset of Computer.
The set of all subsets of a set is called the power set. As an example, if a program can
take two flags, -n and -v, there are four possible combinations of flags you can pass
in:
power_set({-n, -v}) = {
{},
{-n},
{-v},
{-n, -v},
}
Programming lists have a lot of structure, so there’s a lot of ways you can manipulate
them. Given [A, B] and [B, C], I can concat [B, C] to [A, B], concat [A, B] to [B, C], concat
[B, A] to [B, C], interlace the two…
Sets don’t have much structure, so there are only a few basic operations. Given sets
{A, B}, and {B, C}, the basic things we can do are:
1. Union them together, or smush them into one big set: {A, B} | {B, C} = {A, B, C}
2. Intersect them, or find the common elements: {A, B} & {B, C} = {B}
3. Take the set difference, or subtract one set from the other: {A, B} - {B, C} = {A}
4. That’s it!
Mathematicians like sets for their simplicity, and use them as the foundational
bedrock to build out more complex concepts, like lists. As programmers, we are
already used to working with complex concepts. Even so, sets are still useful in pro-
gramming. We will see this in the next chapter.
12 CHAPTER 2. A CRASH COURSE IN LOGIC
It’s also quite useful to map and filter sets. The standard math notation is to write
{𝑓 (𝑥)|𝑃 (𝑥)}, but that’s confusing, so instead I will steal Python notation.
• Map: {x^2 for x in set}
• Filter: {x in set: x > 2}
• Map and filter: {x^2 for x in set: x > 2}
This is sometimes called a “set comprehension” or “set builder notation”.
2.3. QUANTIFIERS 13
2.3 Quantifiers
Let’s move away from software requirements and switch to a different problem.
Software development teams often require changes to the main code to be first pro-
posed as part of a pull request, which must be reviewed by another team member.
More concisely:
A pull request must be reviewed by a team member before it can be
merged.
Let us assume that we have two sets, PullRequest and Developer, that we can use in
our predicates. I would start with this:
Both of these predicates are abstract, but it seems like we should be able to make
CanMerge concrete by defining it in terms of ReviewedBy. For this we need a quanti-
fier, or a predicate over a whole set. There are two common quantifiers in predicate
logic. The first, the one we will use here, is called some: some x in set: P(x) means
that P(x) is true for at least one x in the set set.
Note
ReviewedBy is already typed to only “accept” developers (be false if d is a poo-
dle). But the point of logic is to communicate clearly, so it is better to be clear
and explicit here.
I would read this as “CanMerge is true for the Pull Request element pr if there is at
least one element d in the set of Developers where ReviewedBy(pr, d) is true”. Or, as
just “there is some developer that reviewed the pr.”
The token some is “quantifying over” the set Developer, or alternatively is scoped to
that set. This makes our use of it a scoped quantifier. More rarely, an expression is
true for any value we care to name. For example, the statement some x in set: P(x)
=> some x in set: P(x) || Q(x) is true regardless of set is. In this case, we can choose to
leave out the sets and write
This use of some is not scoped to a set, so we call it an unscoped quantifier. Almost all
quantifiers we used will be scoped.
2.3.1 all
As it stands, CanMerge is too permissive. What happens if the reviewer found a ma-
jor securite flaw? What if five developers review the pull request and two find flaws?
Most companies use a stricter merge requirement:
A pull request must be reviewed by at least one team member, and all
reviewers must approve the request, before it can be merged.
As is our habit, we start by writing the requirements as abstract predicates.
SomeoneReviewed(pr: PullRequest) =
some d in Developer: ReviewedBy(pr, d)
EveryoneApproves(pr: PullRequest) =
`everyone who reviewed pr also approved it`
CanMerge(pr: PullRequest) =
SomeoneReviewed(pr) && EveryoneApproves(pr)
This gives us an opportunity to introduce the other quantifier: all. all x in set: P(x)
says that P(x) is true for every x in our set. With this, it seems like our new predicate
can be written like this:
EveryoneApproves(pr: PullRequest) =
all d in Developer: Approved(pr, d)
But this is wrong. This requires every single developer to approve the pull request,
including developers out sick or on maternity leave. We only want to require that ev-
ery developer who reviewed the pull request to approve it. We can fix this, though, with
implication. Recall that P => Q means !P || Q. Then ReviewedBy(pr, d) => Approved(pr,
d) means that either d approved the pull request or did not review it at all.
EverybodyApproves(pr: PullRequest) =
all d in Developer: ReviewedBy(pr, d) => Approved(pr, d)
Note
2.3. QUANTIFIERS 15
I have seen six or seven different notations that logicians use for expressions
and quantifiers. About the only thing they do agree on is the symbol for some
and all: ∃ and ∀. You might notice these symbols is not on your keyboard, which
is why I instead use ASCII words. As always, you can check the appendix
(page 125) to see some of the more conventional notations.
Exercise 8
Why do we need SomeoneReviewed at all? Isn’t it true that if everybody who
reviewed the PR approved it, then someone must have reviewed it? Find the
edge case where EveryoneApproved is true and SomeoneReviewed is false.
Solution (page 133)
Exercise 9
Define Nat as the set of “natural numbers”: 0, 1, 2, etc.
1. Write the logical statement “every natural number is smaller than itself
plus 1.”
2. Write the logical statement “0 is less than or equal to every natural num-
ber.”
Solution (page 133)
16 CHAPTER 2. A CRASH COURSE IN LOGIC
2.4 Notation
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
This isn’t completely unambiguous: what is 10..=9? I will define it as the empty set:
if a > b, then a..=b is empty. Similarly, a..<b is empty whenever a >= b.
Exercise 11
Rewrite that rule (that if a > b, then a..=b is empty) using the all quantifier. As-
sume both a and b are in the set of integers.
Solution (page 133)
2.4. NOTATION 17
Exercise 12
Write 1..=100 using set filter notation. Filter on the set Int.
Solution (page 133)
Exercise 13 (Divides)
Write IsDivisibleBy(num, divisor), which is true if num is evenly divisible by di-
visor. Use some and ..=.
Solution (page 133)
Another bit of syntactic sugar I find very useful is “conjunction lists”. Complicated
systems often have complicated requirements:
That’s hard to read! To make it easier, let’s instead write it like this:
Rules =
1. A
2. B
3. || C
|| D
4. || E
|| a. F
b. G
Numbers like 4. and letters like a. will always mean “AND”. If I want a list of “OR”s, I
will always use ||.
One last bit of syntactic sugar: sometimes we want to quantify over multiple ele-
ments in the same set. Like this:
IsUnique(list) =
all i, j in 0..<len(list):
list[i] != list[j]
IsUnique(list) =
all i, j in 0..<len(list):
i != j => list[i] != list[j]
This works but is annoying when we want to quantify over three or more variables.
So I will add a new modifier for quantifiers: all disj x, y in set: means “for all disjoint x
and y in set”, aka all distinct pairs of values in the set. With that, I can write IsUnique
in a more intuitive way.
IsUnique(list) =
all disj i, j in 0..<len(list):
list[i] != list[j]
Exercise 14
If I had some disj x, y: P and wanted to rewrite it without disj, what would that
look like?
Solution (page 134)
Exercise 15
If I had all disj x, y, z: P(x, y, z) and wanted to rewrite it without disj, what would
that look like?
Solution (page 134)
Note
This section is draft 0
In the beginning of the book, I said that logic is the mathematics of booleans, just
as arithmetic is the mathematics of numbers. Knowing arithmetic lets us simplify
expressions of numbers. For example, here is how we can simplify the function f(x,
y) = -10x + 2(y + 5x):
1. 2(y + 5x) is the same as 2y + 10x.
2. -10x + 2y + 10x is the same as 10x - 10x + 2y.
2.5. IN PRACTICE: REWRITE RULES 19
Name Rule
De Morgan’s Law !(A && B) == !A || !B
!(A || B) == !A && !B
Contrapositive P => Q == !Q => !P
And/Or Distribution (P && Q) || R == (P || R) && (Q || R)
(P || Q) && R == (P && R) || (Q && R)
Duality all x: !P(x) == !(some x: P(x))
some x: !P(x) == !(all x: P(x))
Some/Or Distribution some x: (P(x) || Q(x)) ==
(some x: P(x)) || (some x: Q(x))
All/And Distribution all x: (P(x) && Q(x)) ==
(all x: P(x)) && (all x: Q(x))
Exercise 16
Use rewrite rules to simplify !(some x: !P(x)).
Solution (page 134)
Exercise 17
Give a real-world example of each distribution rule.
Solution (page 134)
20 CHAPTER 2. A CRASH COURSE IN LOGIC
Exercise 18
The following two are not rewrite rules:
1. all x: P(x) || Q(x) == (all x: P(x)) || (all x: Q(x))
2. some x: P(x) && Q(x) == (some x: P(x)) && (some x: Q(x))
Give an example where each is wrong.
HINT: for the some case, try starting with a valid example of the right-hand-side
and show it doesn’t match the left-hand.
Solution (page 134)
Over time you’ll internalize a lot of rewrite rules. See Useful Rewrite Rules (page 127)
for a list.
2.5.1 Theorems
Rewrite rules are theorems, meaning we can work them out from other rules.
Take contrapositive, for example: P => Q == !Q => !P. We can derive it this way:
1. Start with !Q => !P.
2. Apply the definition of implication to get !!Q || !P.
3. Remove the double negative to get Q || !P.
4. Apply the definition of implication again to get P => Q.
Tada, we just proved the contrapositive rewrite rule works! Try going the other way,
starting from P => Q.
Exercise 19 (Contrapositives)
Start from P => Q and rewrite it into !Q => !P.
Solution (page 134)
2.6. SUMMARY 21
2.6 Summary
In the last chapter, we learned about “rewrite rules”, which let us simplify some log-
ical expressions. Using these rewrite rules, we can simplify code too. Starting with
the conditional !((x && y) || !x) {...}:
Each transformation in the above chain uses a solid, rigorous logical rule. As long
as we do not make a mistake in applying the rule, we do not change the value of the
expression, and we can be confident our simpler code has the same behavior.
Most of the time, we don’t write out every single step along with the name of the
applied rule, since the next steps are obvious in our heads. I “know” I can rewrite
(x && y) || !x as !x || y, in the same way I “know” that four times three is twelve. But
we can always fall back on the rewrite rules if we get confused or have deal with
something messy.
Tip
Some equations can be simplified automatically with tools, like for example
https://www.dcode.fr/boolean-expressions-calculator.
22
3.1. SIMPLIFYING CONDITIONALS 23
Technically, this is what mathematicians would call an “abuse of notation”: the body
of a condition can be any computation, while the right-hand-side of an implication
must be a boolean expression. Even so, it turns out we can manipulate if statements
and conditionals in basically the same way. Any rewrite rule for => gives us a refac-
toring of conditional code. For example, P => (Q => R) is the same as P && Q => R.
Therefore:
if P {
if Q {
R
# is the same as
if (P && Q) {
R
We can take this further. In a previous exercise (page 10) we learned that if P then Q
else R is the same as P => Q && !P => R. Presented with
if (P || !Q) {
# body 1
} else {
if (Q && R) {
# body 2
}
}
Step Rule
!(P||!Q) => (Q && R => body1) if-else
!P && Q => (Q && R => body1) De Morgan
!P && Q && Q => (R => body1) See above
!(P || !Q) => (R => body1) De Morgan
24 CHAPTER 3. REFACTORING CODE
This is the same expression that we started with except that we removed Q from
the middle. In other words, checking Q is true in the second if is unnecessary; we
already know it’s true because we are in the first if’s else branch! The code snippet
simplifies to
if P || !Q:
# stuff
else:
if R:
# other stuff
If you search GitHub, you can find a lot of code like this:
Consider what this does. It searches through a list to see if any element of the list
satisfies a property. Doesn’t that look like one of our quantifiers?
There’s some subtle differences, in that lists are not sets, but it’s close enough.
Wouldn’t is_toolchain be simpler if we could just use the some quantifier directly?
In fact, we can! Most languages have built-in quantifier functions. In Python, these
are all(bool_list) and any. Here’s what is_toolchain looks like using the quantifier:
3.2. REFACTORING WITH QUANTIFIERS 25
Going further, we can simplify expressions using quantifiers just like we would any
other logical expression.
This anonymized block of Python code comes from a large public project.
In formal logic, the condition is !(all x: P(x)) || some x: !Q(x). I can use unscoped
quantifiers because I don’t care about the type of x; this simplification should work
regardless.
My first heuristic is to try to reduce the total number of quantifiers used. Based on
the quantifier distribution rules (page 19), I know that some distributes over ||, so I will
make the dual rewrite rule (page 19) to turn the !all x: P into some x: !P.
Step Rule
!(all x: P(x)) || some x: !Q(x) init
some x: !P(x) || some x: !Q(x) duality
some x: !P(x) || !Q(x) distribution
Step Rule
some x: !P(x) || !Q(x) init
some x: !(P(x) && Q(x)) De Morgan
!all x: P(x) && Q(x) duality
This doesn’t seem to me like a significant improvement over where we were. Even
so, there was no cost to trying and it still gives us good practice. This also opens up
one more possible refactor: if P then Q else R is the same as if !P then R else Q. This
lets us remove the top-level “not”:
# OLD
if not all(P(x) for x in l) or any(not Q(x) for x in l):
do_thing()
else:
do_other_thing()
# NEW
if all(P(x) and Q(x) for x in l):
do_other_thing()
else:
do_thing()
This last refactoring could be a step too far. Programmers tend to think of the if as
the normal case and the else as the exceptional case, and by switching the two, we
may have changed how they understand the code. We must always apply our best
judgement as a software engineer.
Exercise 22
I once saw some code that used the same predicate in two quantifiers:
return any(P(x) for x in l) and all(P(x) for x in l)
Why is the any necessary? Rewrite this to use only one quantifier.
Solution (page 135)
3.2. REFACTORING WITH QUANTIFIERS 27
In the previous example, we treated the predicates P(x) and Q(x) as opaque. If we
can modify the predicates then we can often simplify code even further. One more
anonymized example:
The logical representation of this conditional is !all df in dfs: (!P(df) || Q(df)). First, we
can start by treating the predicates as opaque and rewrite the abstract expression:
Step Rule
!all df in dfs: (!P(df) || Q(df)) init
some df in dfs: !(!P(df) || Q(df)) duality
some df in dfs: P(df) && !Q(df) De Morgan
Looking at the code directly, though, reveals some useful details not present in our
opaque predicates. The first is that P(df) is actually just P: the body of the predicate,
a.chunks, does not depend on df at all. There is no reason to keep it in the quantifer,
and indeed we can extract it outside:
Step Rule
some df in dfs: P(df) && !Q(df) init
some df in dfs: P && !Q(df) P doesn’t depend on df
P && some df in dfs: !Q(df) Extraction
if any(a.chunks
and not len(a.chunks[0]) == df.npartitions for df in dfs):
# becomes
if a.chunks
and any(not len(a.chunks[0]) == df.npartitions for df in dfs):
28 CHAPTER 3. REFACTORING CODE
The second detail is that is that Q(df) takes the form len(a.chunks[0]) == df.npartitions.
We can abstract this by replacing the left hand side (lhs) with the constant c and
the rhs with the function np(df), giving us Q(df) = (c == np(df)). Then !Q(df) can be
simplified to c != np(df).
Step Rule
P && some df in dfs: !Q(df) init
P && some df in dfs: !(c == np(df)) Defining Q
P && some df in dfs: c != np(df) Negation
# OLD
if not all(not a.chunks
or len(a.chunks[0]) == df.npartitions for df in dfs):
raise_error()
# NEW
if a.chunks
and any(len(a.chunks[0]) != df.npartitions for df in dfs):
raise_error()
In general, we can define R(x) = !Q(x) to “hide” a negation. This only makes
sense if we can find a suitable, easily understandable R(x) that doesn’t muddle
things. This can be a new abstract function, like replacing !correct_password(p) with
wrong_password(). Other times, this can involve replacing an infix operator.
Tip
In these examples, we converted the program to a logical formula and did all
our rewriting before converting back. In practice it can be easier to switch be-
tween the representations: rewrite the logic, convert back to code, rewrite the
code, convert back to logic, etc.
Exercise 23
Simplify the expression !(x > 1 && x <= 10).
TODO: more
Solution (page 135)
3.3. PROGRAMS ARE NOT MATH 29
Usually you can get away with writing !P || Q instead. You can also sometimes use
the trick of replacing an implication with a filter, as in this exercise:
Our last refactoring tool is a little different. Not all languages have a built-in data
type for sets. For those that do, set types can be a fantastic tool for simplifying code.
Say that we are modeling a simple social network, where every user has a list of con-
nections. We want to write a function that, for a given user, finds all other users “one
hop away”: everybody that is connected to the input’s connections. For simplicity
we will assume that we retrieve connections with conn_list[user], where connections
is some mapping that returns lists.
Notice that the output should be a collection of unique users. Since the list does not
guarantee that by default, we have to enforce that with checks. On the other hand,
if g.members instead returned a set, we could simplify the code considerably:
We no longer need checks because the set type “takes care” of the uniqueness con-
straint for us. In addition to being simpler, this can often be more efficient. Since
sets do not need worry about order or duplicates, language implementers can make
some set operations more efficient that the corresponding list operations. I pro-
vided a Python benchmark in the book assets2 : as the connection graph grows
larger, the set-based approach becomes orders-of-magnitude faster. This holds
even we include the time taken to construct a set representation from a list rep-
resentation!
On top of these benefits, sets also provide a useful signal to other programmers read-
ing our code. When I see a codebase that uses both sets and lists, I can be confident
they are using the sets for unique unordered data and lists for data that must be or-
2 https://github.com/logicforprogrammers/book-assets/tree/master/code/chapter-03
3.5. SUMMARY 31
dered or duplicated.
3.5 Summary
Example tests are easy to write, but they are also limited. Many functions that are
not max pass these tests:
1. A function that returns the largest absolute value in the list
2. A function that returns the most common element, breaking ties with max
value
3. A function that returns the maximum of the first five elements
4. A function that just returns 3.
The more examples we write, the more invalid functions we rule out. But this is
tedious and error prone. Logic provides us an alternative: express the essential
meaning of the function, and then use this to generate hundreds of tests for us.
32
4.1. STRONG AND WEAK TESTS 33
which is totally captured in the logical expression P => Q. If P can catch a bug that Q
will miss, then P is stronger. As an example:
P = max([1, 2, 3]) == 3
R =
1. max([1, 2, 3]) >= 0
2. max([0, 1, -1]) >= 0
If P passes, Q also passes. If R passes, Q also passes. This means P => Q && R => Q.
We can further see that both are stronger than Q. Notice that P and R are stronger
than Q in different ways: P gives a more specific answer for the same input, while R
tests a wider range of inputs. Finally, neither R nor P are stronger than each other:
each will pass some version of max the other would reject. Mathematicians would
say that => forms a “partial ordering”.
[[TODO graphical diagram of this]]
While P and R are stronger than Q, neither is strong enough, on its own, to guarantee
that max is correct. This version of max passes both tests but still is incorrect:
34 CHAPTER 4. WRITING BETTER TESTS
max(list) =
if list == [1,2,3] then 3 else
if list == [0,1,-1] then 1 else
-infinity
We now have two separate ways of making a test stronger: widen the number of
inputs it tries, or make more specific claims about the outputs. The most powerful
possible test would try every possible valid input to max and make the most specific
claim possible about the output. We would call such a test a total specification (or total
“spec”) of max. It would pass if and only if max was correctly implemented, making
any other kind of direct testing redundant. In other words, if T is any test of max,
then TotalSpec => T.
What then, would be that test?
First we need to define the domain of max- the set of all valid inputs. For the pur-
poses of this chapter, I’ll say max should work for any nonempty, noninfinite list of
integers. The total specification looks like this:
TotalSpec =
all l in NonEmptyIntegerLists:
IsMax(max(l), l)
It sometimes convenient for our purposes to restrict the domain to something ex-
pressable in a language’s type system.
all l in IntegerLists:
len(l) > 0 => IsMax(max(l), l)
Now we have to define what it means to be the “maximum value” of a list. First of all,
it has to be an element of the list: if we take the max value and add ten, we no longer
have the max value. Second, no element of the list is larger than it. It is easier to see
how to formalize this property if we start by defining IsMax for sets:
IsMax(x, set) =
1. x in set
2. `no element of the set is larger that x`
Another way to say “no element is larger than x” is to say “for all elements y in the
set, x is as least as big as y.” That looks like an all (page 14) to me!
4.2. IN PRACTICE: PROPERTY-BASED TESTING 35
IsMax(x, set) =
1. x in set
2. all y in set:
x >= y
[[Programmers work in lists, not sets. We can’t use quantifiers on lists, but we can
instead use them on the set of their indices]]:
IsMax(x, list) =
some i in 0..<len(list):
1. list[i] = x
2. all j in 0..<len(list):
x >= list[j]
TotalSpec =
all l in NonEmptyIntegerLists:
IsMax(max(l), l)
Try writing a few valid tests for max, and then see if they are implied by TotalSpec.
Exercise 27 (Uniqueness)
Write the predicate IsUnique(l), which is true iff every element of l is unique. IE
IsUnique([1, 2, 3])
!IsUnique([1, 2, 1, 3])
IsMax(x, list) =
some i in 0..<len(list):
1. list[i] = x
2. all j in 0..<len(list):
x >= list[j]
TotalSpec =
all l in NonEmptyIntegerLists:
IsMax(max(l), l)
36 CHAPTER 4. WRITING BETTER TESTS
import hypothesis.strategies as s
from hypothesis import given
@given(s.lists(s.integers(), min_size=1))
def test_max(l):
max_val = f(l) # our max function
assert max_val in l # (a)
assert all(max_val >= x for x in l) # (b)
The @given is a generator (“strategy” in hypothesis’ terms) that says the input can
be any nonempty list of integers. We define all of the function’s inputs this way, pass
them to the test, run the function normally, and get the output. Finally, we check if
the output satisfies our specification.
Tip
Reminder, you can download this code sample directly from https://github.
com/logicforprogrammers/book-assets.
3 https://hypothesis.works/
4.3. NOTES ON PROPERTY TESTING 37
Finally, Hypothesis stores a database of known failures and retries them on future
runs.
A partial specification is any spec that is covered by a total spec, ie any test where
TotalSpec => PartialSpec. Every test we have seen so far besides test_max is a partial
specification.
In theory, we should never need to test a partial specification. In practice, the ma-
jority of the tests we write are partial for two reasons. [[One, most of the functions
we work with in software are too complex to be easily total specifiable.]] And even
if we can totally spec a function, partial specs help us localize the source of bugs. A
total spec failing tells us that the function is incorrect, but a partial spec failing tells
us why it’s incorrect.
For this reason, using property testing well means coming up with strong, testable
partial specifications. Most functions will have at least something expressible, often
to do with the domain of the problem:
• A dating app’s match function shouldn’t match people with cats to people with
cat allergies.
• Making a chess move and undoing it should return us to the original game
state.
• A customer who clicks “submit payment” ten times should only be charged
once.
• If we cut frames 126-143 of a video, the output will be seventeen frames
shorter and the 906th frame will now be the 889th.
38 CHAPTER 4. WRITING BETTER TESTS
Note
I could probably make those exercises.
There are also universal “tactics” that apply to many different problems in many
different domains. One of the simplest and most famous tactics: the code does not
crash on some input. This is called fuzzing and is very popular for low-level code
(where memory leaks can lead to security vulnerabilities) and parsers. Similarly,
we could test that no queries made to an API return a 500 error. If we have exception
handling in code, we can test that only “expected” inputs raise errors, and that no
other errors are raised.
Why might I want to test that I have two identical functions? One common reason
is that I might have a simple function that solves my problem, but is too slow for
production. I can use that to test a faster-but-more-complex version of the same
function. Or I might have a simplified function that works for the happy path, and
I want to make sure an edge-case-handling version still gets the same results on
“good” inputs.
My favorite use-case, though, is testing that a refactoring did not change the code’s
behavior. We can take an example from the last chapter (page 25) and show exactly
that.
import hypothesis.strategies as s
from hypothesis import given
Notice that Hypothesis is able to randomly generate functions. These behave some-
what like mocks or stubs in unit testing: they are set to take any number of param-
eters and return a boolean value. In one run P might return False for every integer,
in another it might return True for integers -1, 15, and 7.
Running this test shows that for all values, the simplified version of our function
returns the same result.
Other tactics
One of the most famous tactics is the “round-trip” property, that converting data
into another format and then back doesn’t change the data.
Roundtrip(x_to_y(x), y_to_x(y)) =
all x in X: y_to_x(x_to_y(x)) = x
The polars dataframe library found a bug this way. They generated dataframes, con-
verted the columns into python lists, then converted the lists back into dataframe
columns. The roundtrip is that column -> list -> column should give back the original
column. Hypothesis found that this could drop timezones.
Roundtrip properties are generally effective when you have a custom datatype you
want to convert into a portable format, like json or CSV.
A final useful class of tactics is “metamorphic properties”. These are properties that
relate multiple function calls together. For example, if your computer vision system
recognizes an object, it should recognize the same object if you tilt the picture by two
degrees. Or if you have query API with filters, adding a new clause to a filter should
give you a subset of the results you get without it (a real bug this found in Spotify).
40 CHAPTER 4. WRITING BETTER TESTS
4.4 Summary
5.1 Assertions
An assertion is a statement that should be true of a correct program, and is only false if
the program has a bug. In almost all modern languages, assertions are implemented
with the assert P statement, which ends the program if P is false.
Note
Assertions differ from exceptions in that exceptions can be thrown even if the
code is correct, but encounters some unexpected condition at runtime. If we
try to read a file and the file doesn’t exist, we throw an exception. Missing files
are not what we want but it occasionally happens. If we try to read a file and
somehow compute a negative file size, we raise an assertion error. Negative files
should not be possible.
With assert statements we can take the program specification from a test and embed
it directly in the function. Let’s do that with last chapter’s max function.
41
42 CHAPTER 5. FUNCTIONAL CORRECTNESS
When we add these assertions, the only property test we need to write is
@given(s.lists(s.integers(), min_size=1))
def test_max(l):
max(l)
How does this work? Say we wrote max incorrectly, like this:
The property test engine will generate a random input, like [-1]. Our function will
set out = 1, and then run assert 1 in [-1]. The assert fails, raising an error, which is
caught by the test harness and reported as a test failure.
This means that we can express the specification of a function via assertions just as
we do via predicates. Let MaxPre (for precondition) cover all of the assertions at the
beginning of the function and let MaxPost (for postcondition) be all of the assertions
at the end. Then max is correct if all l in IntegerLists: MaxPre => MaxPost.
Computer scientists often use “requires” and “ensures” to mean “precondition” and
“postcondition”, as those words are easier to read and type. I will use those terms
interchangeably.
Now that we know about assertions, preconditions, and postconditions, we can can
use them to build up “contracts”.
5.2 Contracts
Yes: if no items are available, then we call max([]), which fails the assertion assert l
!= [].
if item.available:
avail.append(item.price)
+ assert avail != [] # surprise!
return max(avail)
Max requires that every caller satisfies its preconditions. In return, max ensures
the postconditions are true of whatever it returns. For reason, we say that max has a
contract, as in “I require you fulfill your side of the contract, and I ensure I will fulfill
my side.”
For max(avail) to satisfy max’s contract, avail must be nonempty, which means there
must be some available item in items. And now max_avail_price, too, has a contract:
def max_avail_price(items):
+ assert any(i.available for i in items)
...
The postconditions also propagate. We know that every value in avail corresponds to
an available item in items, and that the function returns the largest value in avail. So
max_avail_price ensures that its output is the price of the most expensive available
item in the input.
We are beginning to run into a common issue implementing contracts: program-
ming languages are not as expressive as logic, and encoding these postconditions
purely in assert statements gets cumbersome (and computationally expensive)! So
it can be helpful to first express the contract mathematically. This is sometimes
done as comments above the function:
# returns: out
# requires: some i in items: i.available
# ensures:
# some i in items:
# 1. i.available
# 2. i.price == out
# 3. all lesser in items:
# lesser.available => lesser.price <= out
def max_avail_price(items):
...
We could also keep them separate from the code and create our own “contract no-
tation” for functions. This would make it easier to name subpredicates of our con-
tract, annotate output values, use helper predicates and functions, etc. Something
like this:
44 CHAPTER 5. FUNCTIONAL CORRECTNESS
max_avail_price(items) returns o
helpers:
available = `list of available items in items`
requires:
`has an available item`: available != []
ensures:
`output is priciest available item`:
some i in available:
1. i.price = out
2. all i2 in available: i2.price <= i.price
In practice, I have found that preconditions and inline assertions are both easier to
directly encode and more impactful than postconditions.
√
−𝑏 ± 𝑏2 − 4𝑎𝑐
2𝑎
Write a function quadratic(a, b, c) that computes the quadratic for-
mula, using the preconditions and postconditions of both sqrt and
division (no dividing by 0!) Use any language you’d like.
3. Given
x = 5
# requires (a): x >= 0
y = sqrt(x)
# requires (b): y >= 0
z = sqrt(y)
In a typical language, if any of the assertions fail, the program crashes. Sometimes,
we prefer to recover gracefully. Other times, a crash is our best option.
To see why, consider the case where we didn’t have any contracts at all and called
max_avail_price with no available items. The best case, max tries to index an empty
list and throws an error, so we crash anyway. The worst case (JavaScript, Ruby), max
indexes an empty list, returns null or undefined, which is then sent to causes trouble
in some distant part of our code.
Not only do the contracts raise problems earlier, they help us debug the problem
better. Think of the contracts as a series of checkpoints the code must pass through:
46 CHAPTER 5. FUNCTIONAL CORRECTNESS
Note
(v0.11) I need to make a picture for this but in the meantime, here’s a sketch in
a codeblock:
.
max
-----------------
MAPPre => MaxPre => MaxPost => MAPPost
--------------------------------------
max_avail_price
If the MapPre contracts pass but the MaxPre contracts fail, then max_avail_price must
have done something wrong in between setup and calling max. If MaxPre passes and
MaxPost fails, the bug is probably in the implementation of max. If MAPPre fails then
the bug is in whatever is calling max_avail_price.
Tip
Sometimes we prefer to check our assertions in development and testing, but
disable them in production. For this reason, most languages with assertions
support disabling them with a flag. In Python this is the -O flag.
At this point, contracts and assertions look very similar to a more popular software
tool: the type system. max(l) has the type signature list[int] -> int, meaning it requires
that l be a list and ensures that it returns an integer. Many languages can check types
statically, and the typechecker determines that max does not return an integer, then
there must be a problem with our function’s implementation. How does this com-
pare to contracts?
It’s hard to summarize two enormous fields of research, but as a general rule, con-
tracts are better at expressing properties than types, while types are easier to check
are correct. Types can be “replaced” with contracts but not vice versa. For example,
replacing the type signatures of max:
def max(l):
# l is a list of integers
assert type(l) == list
assert all(type(x) == int for x in l)
(continues on next page)
5.3. CONTRACTS VS TYPES 47
It’s not as obvious how to convert the contracts “l is nonempty” or “out is the largest
element in l” back into types! Contracts can encode arbitrary computations, while
types cannot.
The flip side of “encoding arbitrary computations” is that types can be checked at
compile time and contracts cannot. At least, not without special tools and a lot of
work (see Proving Code Correct (page 53)). For this reason, it’s generally a good idea
to use types where possible and contracts only where necessary.
In many languages, it is possible to encode complex properties in clever type defi-
nitions. Say we want to give the item data type the two boolean fields available and
cancelled, and we want to guarantee that they cannot both be true. Instead of two
booleans, we could use the enumerated field status: {avail, unavail, cancelled}. Then
there is no possible way to have an item that is available and cancelled. This tech-
nique is known as “Making Illegal States Unrepresentable”, or MISU.
We have types, we have contracts, we must have contracts on types. A type invari-
ant is a property that must be true for all values of a type. “An item cannot be both
available and cancelled” is a type invariant, one that we can capture in the type def-
inition. Another is “items must have a positive price”, which we have to leave as a
contract:
We need to check the invariant every time we create or mutate a value of the type.
If any function can directly modify item.price then we have our work cut out for us
chasing down every single use. If functions have to go through an item.setPrice()
method then we only need to put an assert in one place.
For this reason, type invariants can be quite useful in object-oriented programming.
OOP invariants are also called class invariants.
48 CHAPTER 5. FUNCTIONAL CORRECTNESS
Note
Languages that support them oftentimes (but not universally) only check the
invariant after calling public method. If the public method calls a private
method, that is allowed to break the invariant, as long as it is restored by the
end of the public method.
If we are working with functions over mutable values we may as well come up with
a notation for “this is what changed” in our contracts.
In representing these contracts we have two additional difficulties. The first is that
we need some way to refer to the “old value” and the “new value” of mutable data.
The few languages that support change variants use acct for the new value and
something like old(acct) for the original value.
ensures:
if ok then
1. acct.balance >= 0
2. acct.balance + item.price == old(acct.balance)
else
acct.balance == old(acct.balance)
Unfortunately, few languages support tracking the old value of a mutation: they just
change the state and are done with things. Some niche languages can track this
but for the most part we have to reason about change assertions logically, not check
them directly.
All modern languages have some sort of polymorphism feature: the ability to pass
many different types to the same function. It might be provided as interfaces,
class-based inheritance, Haskell typeclasses or Rust traits, or something more ex-
otic. Regardless of how it is done, the purpose is the same: we define a function to
take an “abstraction” (for lack of a better word) and then it an “implementation” of
that abstraction.
5.4. POLYMORPHISM AND REFACTORING 49
A common abstraction is the “Mappable”: anything where we can insert and retrieve
values at specific keys. Here V is a “generic” for any type.
# requires: k in keys()
get(k: str): V
# ensures: get(k) == v
put(k: str, v: V)
}
If I write a function f that takes a Mappable, I could pass in any type that implements
get(), put(), and keys(). But I have placed contracts on the abstraction’s methods: get
has a precondition and put has a postcondition. If the body of f is compatible with
these contracts, it doesn’t matter what implementation I put in, I should be safe.
But those implementations come with contracts of their own! Consider the imple-
mentation Counter, which we might use to track how many of each value are in a
list:
impl Counter {
d: Dict[Int]
# other definitions...
This get doesn’t have the same preconditions as the abstraction; it doesn’t have any
abstractions at all! If we take last chapter’s notion of weaker and stronger, where
Strong => Weak, it is always safe to weaken preconditions in an implementation. Map-
pingPre => CounterPre. On the other hand, if our new precondition is stronger or
incomparable, code satisfying the abstract precondition might not satisfy the im-
plementation. Such an implementation isn’t guaranteed to be “safe”.
Postconditions behave differently. Let HistoryMap be an implementation of Map-
pable that tracks the history of each key’s values.
impl HistoryMap[V] {
hash: Dict[V]
hist: Dict[List[V]]
If any code following a put depends on Mapping’s ensurances, they can also depend
on HistoryMap’s ensurances. So it is always safe to strengthen postconditions: Histo-
ryMapPutPost => MappingPutPost. If our implemented postcondition does not imply
our old one, we are once again at risk: our implementation may not longer provide
the guarantees we need.
Note
If you are familiar with object-oriented languages, you might notice how sim-
ilar this is to the Liskov Substitution Principle. This is not a coincidence. Bar-
bara Liskov’s model was originally defined in terms of contracts. See the paper
A Behavioral Notion of Subtyping4 .
4 https://www.cs.cmu.edu/~wing/publications/LiskovWing94.pdf
5.5. SUMMARY 51
These rules apply to any kind of “replacement”, like “replacing code with a refactor-
ing.” If we rewrite max_avail_price in a way that preserves or weakens the precon-
ditions, we are guaranteed to not break any existing use of the function anywhere
in our codebase, and the same with preserving or strengthening a postcondition.
This does not mean that violating this rule is guaranteed to fail. If we know that
our codebase always calls max_avail_prices with a hundred available items, we can
strengthen safely the precondition to always require a hundred items. However, this
carries a risk that some rarely-seen codepath can now blow everything up.
5.5 Summary
• Asserts are statements that are only false if the program has a bug. Typically,
if an assertion fails, we crash the program (though this is often configurable).
• Assertions that must hold going into a block of code are called precondi-
tions/requirements, and those that must hold exiting a block of code are called
postconditions/ensurances. These are part of a function’s “contracts”.
• Contracts “spread” from everywhere a function is used. If X calls Y, X must
guarantee Y’s preconditions and can safely assume Y’s postconditions. This
makes them useful for catching and debugging errors.
52 CHAPTER 5. FUNCTIONAL CORRECTNESS
• Types and contracts share similar roles but have different, synergistic prop-
erties. Types can also have contracts, which are called “type invariants”.
• We can use contracts to understand if certain refactorings or substitutions are
“safe”.
Now that we are familiar with specifications and contracts, we can do something
extraordinary: we can mathematically prove our code correct. This will be the
focus of the next chapter.
Assertions have been around since the era of vacuum tube computers. The first lan-
guage with function contracts was the Euclid Research Language5 . They were fur-
ther popularized in OOP by Bertrand Meyer and his language Eiffel, who also named
the term “Design By Contract”. [[More recently, contract-heavy programming as
seen a revival as a component of the broader “Negative Space Programming” style,
such as with Tigerbeetle]].
Other languages with built-in contract support include D, Ada, Clojure, and Racket
(which is the predominant language used to “research” contracts). Most languages
have at least an assert statement, and many have a third party contract library in
the ecosystem (such as Java’s JML6 .
In practice, contracts and assertions tend to be most often used in “low-level” or “al-
gorithmic” programming, which needs to maintain more internal properties (and
where more things can go wrong). John Regeher has an excellent overview7 on the
use of assertions in this context.
5 https://dl.acm.org/doi/10.1145/954666.971189
6 https://www.cs.ucf.edu/~leavens/JML/index.shtml
7 https://blog.regehr.org/archives/1091
Chapter 6
Proving Code Correct
In programming, we want our software to be correct. Common programming tools
give us confidence in correctness but not certainty. Testing only shows code is cor-
rect for some inputs, while compilers and conventional typechecking show code par-
tially correct for all inputs.
If we want to be sure that a function is totally correct for all inputs, we need a more
powerful approach. We need to use logic to prove the correctness.
First we will cover what we mean by a proof and how we can “prove software correct”,
and then show how it’s done. This chapter is a little more mathematically involved
than the other technique chapters.
53
54 CHAPTER 6. PROVING CODE CORRECT
6.2 Proofs
Take the function qr(x, y) , which returns the quotient and remainder for two positive
numbers. For qr(19, 3), the result should be (6, 1), since 6*3+1 == 19. The contract
form of the total specification is
To prove that qr is fully “correct”, we need to prove that (a), (b), and (c) always hold
for all inputs that satisfy our preconditions. Let’s start with a simple, linear version
of the function:
The first line has a division, which requires that y != 0. We know from qr’s require-
ments that y > 0, and y > 0 => y != 0. So, assuming the preconditions hold, this will not
throw a divide-by-zero error at runtime.
The division ensures that tmp1 * y == x, which is knowledge we can use in the next
steps. Following through the rest of the algorithm tells us that r == x - tmp2 == x - q*y,
so we use some high-school algebra and rewrite that to q*y + r == x. That satisfies
postcondition (a).
That alone is not enough: -11*3 + 14 == 19, but those certainly are not the quotient
and remainders! That’s why we have the second postcondition, which requires a
little more reasoning. Just like we did our rewrite rules step by step, we can do the
algebra here step-by-step:
Step Rule
r == tmp1*y - q*y init
r == (tmp1 - q)*y distribution
r == (tmp1 - floor(tmp1))*y definition of q
0 <= tmp1 - floor(tmp1) as tmp1 >= floor(tmp1)
tmp1 - floor(tmp1) < 1 as floor chops off the decimal
0 <= (tmp1 - floor(tmp1))*y < y multiply all terms by y
0 <= r < y definition of r
We could also implement qr by repeatedly subtracting y from x until we are left with
a number under y. Then the number of subtractions is q and the remainder is what’s
left.
This is more complicated to prove because we don’t know how many times the while
loop will run. qr(100, 60) will only run the loop once, while qr(100, 6) will run it
sixteen times!
What we need to do is find a loop invariant, something that holds true on every iter-
ation of the loop. That means, at the very least:
1. It must be true when we enter the loop
2. It must be true after every loop iteration
3. It must be true when we exit the loop.
This is the invariant I will pick for our loop:
# loopinv: q*y + r == x
The loop invariant looks a lot like our top-level ensurance, just on prefixes of the
list instead of the whole list. This is common and intentional: the loop invariant
progressively “builds up” the top level postcondition, by showing it holds for every
step of building the output. A similar approach can be used to prove the correctness
of recursive functions.
“We can prove this code is correct” is logically HaveProof => Correct. Contraposi-
tively, !Correct => !HaveProof, as in it is impossible to prove incorrect code.
This is where proof most differs from our usual forms of verification. Incorrect code
can still test correctly, if it is correct for most of the inputs. And it can still typecheck
properly, as long the bug does not change the types of the values.
To see this, let’s look at an incorrect version of qr:
The change is that I replaced r >= y with r > y. This adds a bug that only appears
58 CHAPTER 6. PROVING CODE CORRECT
when x is a multiple of y. If we try to prove this code correct from scratch, we will
quickly determine that while (a) and (c) still hold, (b) does not: r == y is a possibility!
It’s fairly likely that test suite would cover that case, but for more complex behavior,
we are more likely to miss some unusual edge case in our test suite.
Then again… for complex code, we are also more likely to make a mistake in the
proof, or not notice that the existing proof is invalidated by a code change. At least
a test suite can be rerun on every change. For proofs to be practical, we would need
some way to programmatically check proofs for correctness.
Enter formal verification.
6.3.1 qr in Dafny
(Dafny treats division between integers as floor division). The compiler will then try
to prove all of the ensurances are satisfied, given the requirements. In this case, it
is smart enough to prove correctness without any help from us. What happens if we
introduce a bug?
- q := x / y;
+ q := x / y - 1; // bug
The compiler gives us an error (Fig. 6.1) saying it cannot prove ensures r <= y. Inter-
estingly, it can still prove ensurances (a) and (c). Similarly, if we remove requires x
>= y, it can no longer prove that ensurance (c), but can still prove (a) and (b).
Dafny fails compilation, unable to prove ensures r <= 0. Formal verification tools
often need help proving things that seem obvious to us. Then again, sometimes
what’s obvious to us is actually incorrect, and Dafny will never make that kind of
mistake. The only help we need to give it is to add another loop invariant saying
that r >= 0 on every loop iteration:
while r >= y
invariant q*y + r == x
+ invariant r >= 0
{
If formal verification can prove code correct, why bother writing tests?
Because formal verification is hard. Very hard. It demands we prove every sin-
gle postcondition we declare, often to the satisfaction of limited tools. Dafny’s
first major success story was the IronFleet paper10 , where researchers verified two
distributed systems in Dafny. In their restropective, they noted that it took 3.7
person-years to develop and prove 5114 lines of code, a rate of about four lines of
verified code per workday. This is considered fast by proof standards.
And just because code is proven correct does not mean it is actually correct! It is only
“correct” if all of our assumptions hold and we only depend on proven properties. A
customer might depend on qr being reasonably fast; replacing it with qr_loop would
ruin their day. And max assumes nothing else is modifying l as we iterate through
it. In a multi-threaded program this may not be a safe assumption.
For most use cases, it is more economical to rely on informal proofs, inline con-
tracts, and property testing. Formal verification only becomes economical for
high-risk, high-severity software, like code involved in cryptography or low-level
10 https://www.andrew.cmu.edu/user/bparno/papers/ironfleet.pdf
6.4. SUMMARY 61
systems. And even in those systems, only one or two core libraries need formal ver-
ification, while the rest does not need that level of scrutiny. For this reason, FV lan-
guages often support compiling into mainstream programming languages. Dafny
supports Python, C#, and Go, among others.
@staticmethod
def qr(x, y):
q: int = int(0)
r: int = int(0)
q = _dafny.euclidian_division(x, y)
r = (x) - ((q) * (y))
return q, r
6.4 Summary
using logic to improve our coding, but writing code is only part of our professional
work. In the next chapter, we will use logic to help us better understand our project
requirements.
Proving code correct via contracts starts with Tony Hoare’s Hoare Logic11 , first in-
troduced in 1969:
Dafny uses an extension of Hoare Logic called “separation logic”, which better cov-
ers language features like aliasing, memory manipulation, and concurrency. The
Dafny website12 has more material on its advanced features and a collection of tu-
torials and references13 .
Not all formal verification languages are based on contracts. Languages like Liquid
Haskell and Idris have type systems that are far more powerful than mainstream
languages, powerful enough to encode the complete specifications of functions.
Contracts tend to be more popular with procedural languages, and types tend to be
more popular with functional languages.
Formal verification is also often done with proof assistants, tools meant for proving
mathematical theorems, adapted to instead prove programs correct. Isabelle, Rocq,
and Agda have been around for a long time. Recently, lean14 is relatively new, but
rapidly rising in popularity. You can also learn Lean through interactive games15 ,
like “The Natural Numbers Game”.
To help people compare different formal verification languages, I maintain the Let’s
Prove Leftpad16 project, where the properties of JavaScripts which compares over
two dozen such languages with explanations.
11 https://en.wikipedia.org/wiki/Hoare_logic
12 https://dafny.org/
13 https://dafny.org/latest/toc
14 https://lean-lang.org/
15 https://adam.math.hhu.de/#/
16 https://github.com/hwayne/lets-prove-leftpad
Chapter 7
Case Analysis
A surprising number of the problems we solve with software are about making “de-
cisions” based on combinations of inputs:
• An application might decide what to show a user based on what feature flags
are set and what part of the world they are in.
• A load balancer might decide whether to spin up or wind down servers based
on server load, minimum/maximum constraints, and time of day.
• An airline might decide whether to offer a perk based on the user’s ticket type,
traveller class, and credit card used.
• A popular text editor decides every setting’s value based on the global default,
language default, the custom user global setting, custom user language set-
ting, custom user project setting, custom user project language setting, and
whether the option takes a number or an object.
When we are asked to implement this kind of software, we don’t get the require-
ments as an exhaustive set of possible combinations, we are given a set of rules.
And this leads to bugs in the requirements themselves, where the rules have a gap in
their coverage… or a contradiction.
The simplest logical tool for analyzing cases is the decision table. The concept can
be learned in minutes, used even by nontechnical team members, and is broadly
useful in finding problems in both code and human requirements. They only work
if the decision depends on a finite set of combinations, but that is a large enough
category to make them worth knowing.
To make a decision table, write down every combination of possible inputs, write
the output for each input, and then put them in a sorted table.
That’s it, that’s decision tables.
In fact, we have already used decision tables earlier in the book. A truth table (page 7)
is just a decision table where all of the inputs and outputs are booleans.
Let’s see an example. Imagine we are managing an event’s ticket page, and are asked
to provide these discounts:
• First 100 registrations get a 10% discount
63
64 CHAPTER 7. CASE ANALYSIS
The requirements are incomplete because they do not specify what should happen
when someone is eligible for two discounts. Incomplete requirements have multiple
valid “extensions”: there are different, perfectly sensible ways to complete them. I
have seen at least four different solutions in real-world systems:
1. Only allow the one highest discount (here 10%)
2. Apply discounts in sequence (here 14.5%)
3. Add all discounts together and apply at once (here 15%)
Let us assume that in this example, the client’s choice is (1), only the maximum dis-
count applies. In this case, if the attendee is an early registrant, it does not matter
whether they are a senior or not. If a value “doesn’t matter” in the final decision, we
can make the table shorter by collapsing all of the that value’s possibilities into one
row. We conventionally call this an any value and mark it with a dash.
7.1. DECISION TABLES 65
This table has only four real rows, but each of the any values covers two possible
values for senior?, so counts as two effective rows. This means this table has six
“effective” rows, as we expected.
If the table were to have less than six effective rows, we would immediately know
that some input was missing. If the table had more than six effective rows, we would
immediately know that it repeats one input on two different rows, mapping them to
different outputs. A table that does not miss any inputs is called sound, while a table
that does not contradict itself is called complete.
This is enough to define “validity” for decision tables: a valid table is one that is
both sound and complete, while an invalid table is unsound or incomplete. This
means that a table without exactly the right number of rows is automatically invalid,
revealing a problem with our requirements.
I’m being careful to use valid, not correct. A decision table can be valid but incorrect—
say, if it does not capture what the client asked of us. But if it is invalid then it is
definitely incorrect. Logically, Correct => Valid. Validity is structural, correctness is
businessal.
Exercise 34 (Fizzbuzz)
fizzbuzz(x: Int) is a function that returns “fizz” if x is divisible by 3, “buzz” if x
is divisible by 5, “fizzbuzz” if divisible by both 3 and 5, and otherwise returns
the number unchanged. Write the decision table for fizzbuzz.
Solution (page 139)
66 CHAPTER 7. CASE ANALYSIS
Note
This might be removed in the next version (v0.12), unless enough people com-
plain about that
We have some videocall software with a “share screen” feature. The host can set two
options:
1. Whether more than one person can share at a time
2. Who can share (Host, Participants)
We’ll use a decision table to model whether I can share my screen or not. Based on
these options, there are four possible inputs in our decision table:
1. Can only the host share?
2. Am I the host?
3. Is someone else sharing?
4. Is multishare enabled?
All of these are booleans. Our table then has 24 = 16 virtual rows total. Here we go:
O H S M out
1 T T T - ERROR
2 T T F - T
3 T F - - F
4 F - T T T
5 F - T F F
6 F - F - T
I marked row (1) as “error” because it should be an impossible state: “only the host
can share”, “we are the host” and “someone else is sharing” cannot all be true at the
same time. If we see this case in production, there’s a bug somewhere in our system.
Counting the rows with one any as two virtual rows, and the rows with two anys as
four, we have a total of 16 unique rows. That means there’s no missing rows in-
dicating a missing requirement, and no duplicate rows indicating a requirement
contradiction.
7.3. ANALYZING CODE 67
But just because the table’s complete doesn’t mean it’s correct. This version of the
table has a bug. Can you see it?
It’s row (5). If multishare is disabled and someone else is sharing, then I can’t share,
even if I’m the host.
One nice thing about decision tables is that even nontechnical people can under-
stand them, so you can get their involvement in checking requirements. Fixing the
table:
O H S M out
5a F T T F T (kicks other)
5b F F T F F
Exercise 35
Speaking of screensharing, I recently embarrassed myself on a video call. My
microphone has a hardware mute switch, which I had toggled on, and assumed
I didn’t need to press the software mute button. What I didn’t realize was my
webcam also had a microphone, zoom was using it, and my hardware mute
switch was doing nothing. Everybody could hear me loud and clear.
Model “Am I (m)uted” as a decision table, with the columns “(z)oom mute”,
“(h)ardware mute”, and “(w)hich mic” (desk or webcam).
Bonus: how can I prevent this from happening again?
Solution (page 139)
If a code path uses branches and no loops or recursion, we can represent its
high-level behavior as a decision table. This can be useful if the code depends on
external sources for some of the inputs. [[While the implementation may have to
spread its logic over the whole function, we can still use a decision table to organize
the high-level behavior. ]]
For example, Python’s file open function17 has different behavior depending on
what mode string was passed in, whether the string includes a “+”, and whether the
file exists. In the implementation18 , the function parses the mode string and then,
17 https://docs.python.org/3/library/functions.html#open
18 https://github.com/python/cpython/blob/main/Modules/_io/fileio.c#L248
68 CHAPTER 7. CASE ANALYSIS
much later, checks if the file exists. Representing the function as a decision table
makes it easier to see the high-level behavior.
7.4 Techniques
In Python’s open function’s mode string, we can write "r+" instead of "r". Adding +?
as an input to the table would bring it from ten effective rows to twenty. It would also
not benefit us in any way, because the + does not change any of the file effects.
Writing "r+" instead of "r" instead changes the file handle’s read/write permissions.
But those permissions do not change if the file already exists or not, unless that
would be the source of an error (like "x+" on an existing file).
These independences may not be obvious to someone looking at a giant twenty-row,
six-column table. We can instead present them with two smaller tables that only
contain the inputs relevant to each decision.
7.4. TECHNIQUES 69
There are trade offs here in how concise vs comprehensive we want the tables to be.
If it is important for the table to keep track of all error cases, then we need all three
inputs.
Decision tables aren’t the best tool for representing a whole lot of state changes, or
something getting changed multiple times, but sometimes it’s useful to show how a
single value changes.
# example code
if x % 2 == 0
x = x/2;
else
x = 3*x+1;
In a previous chapter we represented changes with old(x) (page 48). In this case, we
are going to borrow a mathematical notation that is a little more compact. In some
branches of math and science, they write x' to mean the new value of x after some
change. Here, that would give us the table
x%2 x'
0 x/2
1 3x+1
Ideally, the table should model a single step. This means that we can update multiple
values in the same table, as in a swap function:
70 CHAPTER 7. CASE ANALYSIS
x' y'
y x
We could represent the values in the step after with x'', but if we are doing that then
decision tables are probably the wrong tool for the job.
If we want to demonstrate not just that an output is impossible, but that a particular
combination of inputs is impossible, we can use a / to say not just “it doesn’t matter”,
but “it’s not possible”.
Logically, this is the same as writing a dash/any, but it signals to the reader that it
shouldn’t happen, not that it doesn’t matter.
The easiest way to accidentally make a table unsound is through misuse of anys.
Fortunately, this is also relatively easy to detect.
If we expand the anys, we have both TT -> T and TT -> F, which is unsound. Here we
can detect the issue because the table isn’t complete, as we don’t have a row for FF.
If we add that as a special case, then we’ll have 5 effective rows total, which should
alert us that the table is unsound.
One easy way to avoid this is to never place an any to the left of a fixed value. This
can lead to some table bloat but is better than having an invalid table!
7.5. WHEN IS A TABLE THE WRONG CHOICE? 71
count o
-10 T
11-20 F
21-30 T
If we know for sure count maxes out at 30 then this table is complete. If it can go
higher, we have not covered all possibilities, so the table is incomplete.
7.6 Summary
• Truth tables enumerate every possible input and output of a logical expres-
sion.
• Decision tables extend truth tables to systems, by allowing multiple inputs and
outputs.
• Decision tables are complete if they have no missing inputs, and sound if they
don’t have the same set of inputs twice.
In the next chapter, we will look at how logic can be used to better understand
databases.
Modern relational databases are based on Edgar F. Codd’s relational model. We will
not go into comprehensive detail on the model but provide an overview we need.
See the Further Reading (page 84) for a deeper dive.
A database is a set of tables, and a table is a set of records. What is a record? We
can start by saying that a record is an ordered list of values, or tuples. One example
database could be
users = {
(1, "[email protected]"),
(2, "[email protected]"),
(3, "[email protected]"),
# etc
}
# etc
For a given record u, we can get the first element with u[0], the second with u[1], etc.
This satisfies our needs for both inconvenience and incomprehensibility. It would
be better to give the elements names and types, so we could write u.id instead of u[0].
And, as we can create new notation whenever we want (page 16), I will add a simple
record syntax:
record users {
id: Int
email: String | {NULL}
}
73
74 CHAPTER 8. DATABASES
Recall that | is set union (page 11). We will define this to mean that each element
of the set users has two fields, one called id that must be an integer and one called
email that may be a string or null. You may note this is very similar to how we would
define a table in SQL:
This is intentional! We can choose our logical notation to closely match the software
systems we build. At the same time, I chose to make nullable fields explicit, where
SQL makes them the default. I find that leads to fewer mistakes.
That said, I find writing set | {NULL} a little unwieldy, so I will add a bit of syntactic
sugar and define set + x to mean set | {x}, so that we can write email: String + NULL.
In any case, because tables are just sets, we can quantify over them like any other
sets. And this is where the logical model really shines. Our two quantifiers represent
the two essential purposes of a database: querying data and ensuring data integrity.
The results of this query can be represented as the set filter {(u.email) for u in users:
u.id == 5}. It will help us down the road if we can instead explore the properties of
queries as if they were some expressions. So we can instead ask “does the query re-
turn any results at all”? That question corresponds to the expression some u in users:
u.id == 5. If we wanted multiple WHERE clauses, we could just add more clauses to
the logical expression:
-- some u in users:
-- 1. u.id == 5
-- 2. (u.email == "" or u.email == null)
Nested subqueries and common table expressions are just nested quantifiers. If I
want to query if user five belongs to any groups, I could write
8.2. QUERYING DATA 75
-- some u in users:
-- some ug in user_groups:
-- u.id == g.user_id
What does logical representation of queries actually get us? For one, it means we can
abstract complicated SQL expressions with predicates. When do two users belong
to the same group? When each user has a group membership for the same group.
As a predicate, this is easy:
And from there we can compose our predicates to make more complex queries.
But SQL databases cannot do this. Standard SQL does not support directly using
user-defined predicates in queries. As a consolation prize, we can instead gener-
ate the set of all values that pass a predicate and then use that set in other queries.
The database term for this is a view.
-- memberships =
-- {(u.id, g.id) for u in user, g in groups: Member(u, g)}
Of course that’s not how any actual SQL user would write memberships. They would
use a join!
76 CHAPTER 8. DATABASES
Most SQL tutorials “explain” joins in terms of set unions and intersection, often with
diagrams like this:
But this makes no sense. users and user_groups are disjoint sets, so the intersection
should be empty!
To properly represent an inner join, we need to introduce one small new set opera-
tion. The Cartesian product of two sets S and T is the set of all tuples where the first
element of the tuple is in S and the second element is in T. We can formally define
this via the set map (page 12):
S x T = {(s, t) for S in S, t in T}
S x T x U = {(s, t, u) for S in S, t in T, u in U}
# etc
For example, the Cartesian product of Nat x Alphabet is the set containing (0, a), (0,
b), (1, a), etc. The operator is named after René Descartes, who pioneered its use,
and is called a “product” because #(S x T) == #S * #T (where #S is the number of
elements in S).
8.2. QUERYING DATA 77
With the Cartesian product, we can represent the inner join like this:
There is no difference between the WHERE and ON clauses in the logical represen-
tation, just as there is no difference in SQL: most dialects will happily let you put a
join condition in the WHERE or a filter in the ON. If we want to inner join across three
tables, the syntax is exactly the same:
Outer joins are more difficult for beginners to learn, which may be related to the fact
that outer joins are also more complex to represent logically. A left outer join on S
and T returns all the same rows as an inner join, but also the rows of S that don’t join
with any rows on T. This is the same as this query:
Note
What about aggregate functions, like GROUP BY? This is where our logic breaks
down a little. Aggregates were never part of the relational model and act more
like “postprocessing” steps on the query. I have not found any good formal
models but personally think of them as “partition functions”.
# given
SELECT g, h, aggrfunc(t) FROM Table AS t WHERE P
GROUP BY t.g, t.h
78 CHAPTER 8. DATABASES
# we could write
partition_set = {(t.g, t.h) for t in Table}
rows_for(g, h) =
{t for t in Table: t.g == g && t.h == h && P(t)}
{(g, h, aggrfunc(rows_for(g, h))) for (g, h) in partition_set}
It’s kind of a mess but it can be worked with given patience. Please do not ask
me about window functions.
Getting data out of a database is only half of the challenge. The other half is get-
ting data into the database, and more importantly keeping it correct. We don’t want
records to be missing values or foreign keys, or have duplicate ids, or miss any of
the application specific requirements like “user balances must be above zero” or
“no more than ten records can be active at once.”
We can start by looking at some ways databases represent constraints:
These two tables define three constraints, each with its own special syntax. One
applies to each record in a table, one to every pair of records in the table, and one
to records between tables. All three constraints can be directly represented with
logical expressions. Starting with “user balances must be above zero”:
constraint UserEmailUnique =
all disj u1, u2 in Users:
u1.email != u2.email
The last constraint, a foreign key constraint, says that every user_group record has
a corresponding user record. Another way of thinking about this that if user_group.
user_id == 17, there must exist a user with id 17, which means the query some u in
users: u.id == 17 is true. In other words, foreign keys constrain each record to guar-
antee a query! For this reason we can represent key constraints with a nested quan-
tifier, placing a some inside an all.
constraint UserGroupUserFK =
all ug in user_groups:
some u in users:
ug.user_id = u.id
constraint UserGroupUserFK =
FK(user_groups, users, user_id, id)
Note
This raises a question: if all-some nested quantifiers have a deep meaning in
databases, does some-all mean anything? I have no idea. I welcome sugges-
tions from readers.
80 CHAPTER 8. DATABASES
Exercise 38
Write the constraint “If a user belongs to a group, the user must have a non-null
email”.
HINT: use =>.
Solution (page 140)
Exercise 39
Let #S be the number of elements in S. Write the constraint “all groups can only
have five members at most.”
HINT: Use set filter (page 12).
Solution (page 140)
constraint NoTimeTravel =
all b in books:
some a in authors:
1. b.author_id = a.id
2. b.published_on > a.birthday
constraint NoTimeTravel =
all b in books:
all a in authors:
b.author_id != a.id ||
b.published_on > a.birthday
We turned the some in NoTimeTravel into an all. We also changed the body to be
“either the author is different or the book was published after the author was born”.
While we could have used a => instead, SQL does not have an implication operator,
so writing !P || Q keeps us closer to the eventual implementation. This, combined
with BooksAuthorsFK forcing each book to have exactly one author, is equivalent to
our original constraint. Our next step is not strictly necessary, but will clarify our
final outcome:
constraint NoTimeTravel =
all (b, a) in books x authors:
b.author_id != a.id ||
b.published_on > a.birthday
Now for the insight. If this constraint does not hold, there must be a specific (book,
author) counterexample that violates it. And we can write a query to find the coun-
terexample! If the query turns up nothing we know there are no counterexamples,
meaning the constraint holds. This is just another example of logical duality!
82 CHAPTER 8. DATABASES
constraint NoTimeTravel =
!some (b, a) in books x authors:
!(b.author_id != a.id ||
b.published_on > a.birthday)
To actually enforce the constraint, we can use a SQL “trigger”, or a stored procedure
set to run on row or table changes. All we need to do is declare a trigger trigger that
makes this query and raises an exception if the query is nonempty. For brevity, an
example is separately provided with the code samples19 .
SQL triggers have one other useful feature: when triggered by a record update, they
can check constraints on how the record changed. We can for example enforce that
an updated_at timestamp can only go forwards in time or that when a nullable field
has a non-NULL value, it cannot be set back to NULL.
We have already represented changes in previous chapters: in change assertions
(page 48) we used old(x) and x, while in decision tables (page 69) we used x and x'
(“x prime”). SQL syntax uses NEW and OLD, but to make a later chapter (page 95)
easier I will use the prime syntax right now.
constraint NoNullAfterAdmin =
all g in Groups:
g.admin_id != NULL => g.admin_id' != NULL
Once the admin_id is not null, the next value cannot be null either, meaning that it
can never go back to null.
19 https://github.com/logicforprogrammers/book-assets
8.5. SUMMARY 83
This is also useful for state machine columns: a record can go WAITING -> READY or
READY -> DONE, but not WAITING -> DONE. In that case it’s considered good form to
“allow x to change to itself”:
constraint StateMachineTransitions ==
all t in tasks:
1. t.status = "WAITING" => t.status' in {"WAITING", "READY"}
2. t.status = "READY => t.status' in {"READY", "DONE"}
8.5 Summary
The database representation in this chapter comes from Edgar Codd’s Relational
Model. The relational model was first introduced in A relational model of data for
large shared data banks20 , along with a set of operators that made the Relational Al-
gebra. A gentler introduction to relational algebra can be found here21 . SQL is based
on relational algebra but does not follow it in its entirely.
The best way to learn about the capabilities of database invariants is to read the
official database documentation. While this chapter is compatible with SQLite, the
best documented is arguably Postgres:
• CHECK constraints22
• Trigger constraints23
20 https://dl.acm.org/doi/10.1145/362384.362685
21 https://cs186berkeley.net/notes/note6/
22 https://www.postgresql.org/docs/current/ddl-constraints.html
23 https://www.postgresql.org/docs/current/sql-createtrigger.html
Chapter 9
Data Modeling
Note
This is scheduled for a rewrite and needs to be updated after the databases
chapter was rewritten in v0.11
In the last chapter, we used logic to figure out database constraints. To do so, we
stuck close to database semantics: foreign keys are number columns, relationships
between entities go through a many-to-many table, etc.
Any database schema is only one possible representation of the abstract data model.
In this chapter, we will use logic to analyze the model directly.
record Users {
id: Int
}
record Groups {
id: Int
admin_id: Int
}
record GroupMembership {
id: int
user_id: int
group_id: int
}
I see three “implementation details” that don’t matter to the abstract model:
1. I don’t care whether the group id is an integer or a UUID or something else,
what I really care about is that the groups are distinct.
2. Why is admin_id an integer? Why can’t we just say the admin is a user? The
database needs an integer column, but in our heads, groups have admins, not
integers.
85
86 CHAPTER 9. DATA MODELING
sig User {}
sig Group {
admin: User
members: set User
}
I’m using “sig” for signature, because these are not records. They’re just a data
model, where groups have admins and sets of users. No implementation details
have leaked into my model!
(Though even this is biasing things a little: what if we instead wanted to have mem-
ber_of be an element of the User and not the group?)
One of the constraints from last chapter, that a group’s admin must also be a group
member, is easily expressed like this:
all g in Group:
g.admin in g.members
ods coin that we first introduces in an earlier chapter (page 58), just checking designs
instead of code.
There are many different formal specification languages, but the one I want to use
now is called Alloy24 . I’m not going to go into too many of the specifics of Alloy; that’s
beyond the scope of this book. But I’ll show you how it solves these problems.
First, we define the components of our data model and our constraints:
sig User {}
sig Group {
admin: User,
members: set User
}
pred admins_members_of_groups {
all g: Group |
g.admin in g.members
}
Note that Alloy uses a different syntax for quantifiers: all g: Group | prop instead of
all g in Group: prop.
Once we have the basics, we can write a “command”, telling Alloy to find examples
of systems where certain properties are true. In this case, ask it for examples of
groups containing all admins:
run group_with_all_admins {
admins_members_of_groups &&
some g: Group |
all u: User |
is_admin[u] => u in g.members
}
Running Alloy’s built-in analyzer (I use VSCode) gives us a visualization of the ex-
ample:
Alloy can also generate new examples to visualize, change the theme, and even run
a REPL on specific examples. It’s a great tool for finding unexpected situations!
24 https://alloytools.org/
88 CHAPTER 9. DATA MODELING
We can also ask Alloy to check that a property always holds. This is usually used
to check that we guarantee a data invariant. For example, we might want a data
invariant to be “groups are never empty”.
check no_empty_groups {
admins_members_of_groups =>
all g: Group | some g.members
}
9.2.1 Abstractions
record Users {
id: Int
+ referrer: Int + NULL
}
This implies a new data invariant: users cannot be their own referrer. As a SQL
constraint, it would look something like u.referrer != u.id. In Alloy, it would look like
this:
sig User {
referrer: lone User // 0 or 1
}
pred no_self_loops {
all u: User |
u != u.referrer
}
Now, one more twist to the constraint: no user can transitively be their own referrer.
If Alice refers Bob and Bob refers Eve, Eve cannot have referred Alice.
This is extraordinary difficult in SQL. At the very least we’d need use recursive com-
mon table expressions, and the resulting query will be convoluted and computa-
tionally expensive.
On the other hand, transitive lookups are trivial in Alloy. In Alloy, Alice.^referrer is
the “transitive closure” of referrals: the set containing Alice’s ref, the ref’s ref, the
ref’s ref’s ref, etc.
The same constraint in Alloy:
pred no_cycles {
all u: User |
!(u in u.^referrer)
}
90 CHAPTER 9. DATA MODELING
It’s good that we can express the constraint in Alloy, but that doesn’t help us with
our actual SQL database. SQL still doesn’t cleanly support transitive lookups.
But we can use Alloy to figure out an implementable SQL constraint that also guar-
antees no_cycles. Then we’d test
In this case we’d say that implementable_property is stronger than no_cycles. One
idea I have would be to place some ordering on users, like id or signup date. Then I’d
predict that if we could only refer someone with an earlier signup date, we wouldn’t
have any cycles. This would be relatively easy to check in SQL.
Exercise 41
Write the constraint (in our notation, not Alloy’s) “If a user has a referrer, the
user’s created_at is later than the referrers created_at.
Solution (page 140)
In Alloy:
sig User {
referrer: lone User,
created_at: disj Int
}
pred referral_must_come_later {
all u, ref: User |
u.referrer = ref => gt[u.created_at, ref.created_at]
}
Now we can check that our implementable constraint guarantees our data model
property:
check implementation_works {
referral_must_come_later => no_cycles
}
The main use-case of formal specifications is to find errors in designs. Design er-
rors are more expensive than code errors, and so are more important to detect early.
[[Since formal specifications live at a higher level of abstraction, they can more eas-
ily find design errors.]]
When I teach Alloy, I demonstrate this with a simplified model of access permis-
sions. We have a set of Users and Resources. Resources can only be read by Users
in their readable_by set.
sig User {}
sig Resource {
readable_by: set User
}
On top of this, we add that some resources have parents. If our resources are files,
the parent could be the containing folder. As with our prior example of referrals, no
resource can transitively be its own parent.
sig Resource {
readable_by: set User
+ ,parent: lone Resource
}
+ fact no_cycles {
+ no r: Resource |
+ r in r.^parent
+}
Finally, we amend the access rule, so that a user can access a resource if they have
permission to read its parent.
After adding this, I ask my class “if we can access a resource, are we guaranteed to
access all of its children?”
assert parent_implies_child {
all u: User, r: Resource |
can_access[u, r] =>
all child: r.~parent | //r.~parent is `children of r`
can_access[u, child]
}
check parent_implies_child
Most people are surprised to find out no, this property does not hold! As before, we
can see the counterexample as a graph visualizatin, but we can also output it as an
ASCII table:
┌─────────────┬───────────┬──────────┐
│this/Resource│readable_by│parent │
├─────────────┼───────────┼──────────┤
│Resource$0 │User$0 │ │
├─────────────┼───────────┼──────────┤
│Resource$1 │ │Resource$0│
├─────────────┼───────────┼──────────┤
│Resource$2 │ │Resource$1│
└─────────────┴───────────┴──────────┘
This is the real power of formal specification: the full spec is less than 30 lines and
still finds a subtle error many experienced developers miss. To fix this, we can mod-
ify readable_by to transitively check a resource’s entire ancestry.
(As before, we would still need to find a way to implement a transitive constraint in
our database. But it is always better to be working on implementing a correct design
than to implement a possibly-broken one.)
9.4 Summary
1. We can represent data (or other systems) at a higher level of abstraction than
what the database implements.
2. By doing this, we can test the abstractions directly, in a formal specification
language.
3. Alloy is one such formal specification language, and can produce visualiza-
tions of satisfying properties. It can also test that properties hold.
4. Because we’re at a higher level of abstraction, we can express invariants that
would be impossible to directly enforce at the database level.
5. Alloy can test if an implementable constraint also guarantees an abstract in-
variant.
While data modeling is a good use case for formal specification, it really shines for
modeling concurrent systems. In the next chapter, we will show how a formal spec-
ification can find race conditions in a software design.
• Alloy Docs25
• Formal Software Design with Alloy 626
• Software Abstractions27 (book)
25 https://alloy.readthedocs.io/en/latest/
26 https://haslab.github.io/formal-software-design/
27 https://mitpress.mit.edu/9780262528900/software-abstractions/
94 CHAPTER 9. DATA MODELING
28 https://bytes.zone/posts/modeling-database-tables-in-alloy/
29 https://bytes.zone/posts/modeling-git-internals-in-alloy-part-3-operations-on-blobs-and-trees/
30 https://jwbaugh.github.io/papers/baugh-abz-2016.pdf
Chapter 10
System Modeling
In the last chapter, we showed how formal specification can be used to analyze a data
model and look for problems. But that’s only the tip of the specification iceberg. We
can also use it to model systems.
10.1 Situation
We have some bank users. Bank users can wire money to each other. We have over-
draft protection, so wires cannot reduce an account value below zero. That’s easy to
guarantee, just throw an if check on each wire and you’re done!
…But what if users can send multiple wires at the same time? What if a computer
crashes in the middle of processing a wire? What if someone tries to send themselves
money? What if someone tries to send themselves money in multiple wires at the
same time, and then one of the servers crash?
This is why we need to model systems. We want to see that our properties hold under
every possible behavior, not just on the happy path.
And we’ll use logic to model it.
We’re going to handle this system in three stages. First, we’ll see how our regular
predicate logic is enough to accurately model our problem. Then, we’ll make a sim-
ple extension to our logic to more elegantly express the spec. Finally, we’ll translate
it to a real tool that can directly check our logic for errors.
For now we’ll assume an extremely simple system: two hardcoded variables alice
and bob, both start with 10 dollars, and transfers are only from Alice to Bob. Also,
the transfer is totally atomic: we check for adequate funds, withdraw, and deposit
all in a single moment of time. Our modeled system will be more complex; this is
just to relate the ideas.
First, let’s look at a valid behavior of the system, or possible way it can evolve.
95
96 CHAPTER 10. SYSTEM MODELING
In programming, we’d think of alice and bob as variables that change. How can we
express those variables purely in terms of predicate logic? One way would be to re-
place them with arrays of values. alice[0] is the initial state of alice, alice[1] is after
the first time step, etc. Time, then, is “just” the set of natural numbers.
The first is invalid because Bob received more money than Alice lost. The second is
invalid because it violates our proposed invariant, that accounts cannot go negative.
Can we write a predicate that is true for valid transitions and false for some transition
in our two invalid behaviors?
Here’s one way:
Time = Nat
Transfer(t: Time) =
some value in 0..=alice[t]:
1. alice[t+1] == alice[t] - value
2. bob[t+1] == bob[t] + value
Go through and check that this is true for every t in the valid behavior and false for
at least one t in the invalid behavior. Note that the steps where Alice doesn’t send a
transfer also pass Transfer; we just pick value = 0.
I can now write a predicate that perfectly describes what a “valid behavior” is:
Spec =
1. alice[0] == 10
2. bob[0] == 10
3. all t in Time:
Transfer(t)
Now allowing “nothing happens” as “Alice sends an empty transfer” is a little bit
weird. In the real system, we probably don’t want people to constantly be sending
each other zero dollars:
10.2. THE LOGIC 97
Transfer(t: Time) =
- some value in 0..=alice[t]:
+ some value in 1..=alice[t]:
1. alice[t+1] == alice[t] - value
2. bob[t+1] == bob[t] + value
But now there can’t be a timestep where nothing happens. And that means no be-
havior is valid!
Spec =
1. alice[0] == 10
2. bob[0] == 10
3. all t in Time:
|| Transfer(t)
|| 1. alice[t+1] == alice[t]
2. bob[t+1] == bob[t]
(This is also why we can use infinite behaviors to model a finite algorithm. If the
algorithm completes at t=21, t=22,23,24... are all stutter steps.)
There’s enough moving parts here that I’d want to break it into subpredicates.
Init =
1. alice[0] == 10
2. bob[0] == 10
Stutter(t) =
1. alice[t+1] == alice[t]
2. bob[t+1] == bob[t]
Spec =
(continues on next page)
98 CHAPTER 10. SYSTEM MODELING
Now finally, how do we represent the property NoOverdrafts? It’s an invariant that
has to be true at all times. So we do the same thing we did in Spec, write a predicate
over all times.
property NoOverdrafts =
all t in Time:
alice[t] >= 0
We can even say that Spec => NoOverdrafts, ie if a behavior is valid under Spec, it
satisfies NoOverdrafts.
This is good and all, but in practice, there’s two downsides to treating time as a set
we can quantify over:
1. It’s cumbersome. We have to write var[t] and var[t+1] all over the place.
2. It’s too powerful. We can write expressions like alice[t^2-5] == alice[t] + t.
Problem (2) might seem like a good thing; isn’t the whole point of logic to be expres-
sive? But we have a long-term goal in mind: getting a computer to check our formal
specification. We need to limit the expressivity of our model to make it tractable to
our tooling.
In practice, this will mean making time implicit to our model, instead of explicitly
quantifying over it.
[[The first thing we need to do is limit how we can use time.]] At a given point in
time, all we can look at is the current value of a variable (var[t]) and the next value
(var[t+1]). No var[t+16] or var[t-1] or anything else complicated.
10.2. THE LOGIC 99
And it turns out we’ve already seen a mathematical convention for expressing this:
priming (page 69)! For a given time t, we can define var to mean var[t] and var' to
mean var[t+1]. Then Transfer(t) becomes
Transfer =
some value in 1..=alice:
1. alice' == alice - value
2. bob' == bob + value
Next we have the construct all t in Time: P(t) in both Spec and NoOverdrafts. In other
words, “P is always true”. So we can add always as a new term. Logicians conven-
tionally use or [] to mean the same thing.
property NoOverdrafts =
always (alice >= 0 && bob >= 0)
// or [](alice >= 0 && bob >= 0)
Spec =
Init && always (Next || Stutter)
Now time is almost completely implicit in our spec, with just one exception: Init has
alice[0] and bob[0]. We just need one more convention: if a variable is referenced
outside of the scope of a temporal operator, it means var[0]. Since Init is outside of
the [], it becomes
100 CHAPTER 10. SYSTEM MODELING
Init =
1. alice == 10
2. bob == 10
And with that, we’ve removed Time as an explicit value in our model.
The addition of primes and always makes this a temporal logic: one that can model
how things change over time. And that makes it ideal for modeling software sys-
tems.
Note
You don’t have to make a temporal logic to analyze systems. Before 2022, Alloy
users modeled systems by making an explicit Time signature. But this proved
to be cumbersome, so in 2022 Alloy incorporated a temporal logic model.
Regardless, we’ll be using a specification language was that designed with tem-
poral logic from the ground up.
One of the most popular specification languages for modeling these kinds of con-
current systems is TLA+. TLA+ was invented by the Turing award-winner Leslie
Lamport, who also invented a wide variety of concurrency algorithms and LaTeX.
Here’s our current spec in TLA+:
Init ==
alice = 10
/\ bob = 10
AliceToBob ==
\E amnt \in 1..alice:
alice' = alice - amnt
/\ bob' = bob + amnt
Next ==
AliceToBob
\/ BobToAlice
NoOverdrafts ==
[](alice >= 0 /\ bob >= 0)
====
TLA+ uses ASCII versions of mathematicians notation: /\ and \/ for &&/||, \A and
\E for all/some, etc. == is used for definition, and [][Next]_vars is TLA+ notation for
[](Next || Stutter).
Now that we have a specification and a property, we can use a model checker to gener-
ate all possible states of this system and see if any of them break our invariant. Like
Alloy, TLA+ is most often checked from VSCode via an extension31 . But setting up a
model run takes a bit of configuration, so I created a tool called tlacli32 to do more
from the command line. It doesn’t support all of TLA+’s features but is suitable for
quick demos like this.
So this is all well and good for our simple model, but what if more than one trans-
action could be in flight at the same time? Does our invariant still work if with con-
current, nonatomic transactions?
31 https://github.com/tlaplus/vscode-tlaplus/
32 https://github.com/hwayne/tlacli
102 CHAPTER 10. SYSTEM MODELING
We could add concurrency to our “pure” TLA+, [[but that requires a few “TLA+-isms”
I don’t feel like explaining right now.]] So instead we’re going to use PlusCal, a lan-
guage that compiles to TLA+. It’s built-in with the TLA+ tooling and looks more like
programming language than a math formula, so it’s very popular with beginners.
(* --algorithm wire
variables
acct \in [People -> Money];
define
NoOverdrafts ==
[](\A p \in People:
acct[p] >= 0)
end define;
====
Most of this looks like a programming language with some unusual syntactic choices,
but there’s some things to pay attention to. acct is set to any value of [People ->
10.3. IN PRACTICE: TLA+ 103
Money], roughly the set of all mappings of Alice and Bob to numbers between 1 and
10. So acct can start as {alice: 1, bob: 10}, {alice: 3, bob: 6}, or any of the other 98
possible combinations.
Our model also starts with two distinct wires simultaneously (process wire \in 1..
NumTransfers where NumTransfer == 2). Each wire has its own amnt, from, and to,
which are also individually elements of sets. Different wires can pick different local
values for this. Between this and the amnt, there are 40,000 possible initial states.
Inside wire we have Check:, Withdraw:, and Deposit:. These are labels, or groups of
atomic actions. Each wire takes three steps to fully process: checking the balance
is one step, withdrawing is one step, and depositing is one step. To a first order
approximation, there are twenty possible ways the two wires can interleave. More
precisely, slightly fewer, because some wires will end early at the Check.
Finally, NoOverdrafts is a straight translation of our old version, just generalized to
any number of people.
To compile the PlusCal to TLA+, I ran tlacli translate transfers2.tla. The translation is
done in-file and appears below the code. Now let’s model check it and see if NoOver-
drafts still holds:
This bug happens because checking and withdrawing are nonatomic: they happen
in different time steps. If we make them happen in the same time step, the error
should go away:
begin
- Check:
+ CheckAndWithdraw:
if acct[from] >= amnt then
- Withdraw:
acct[from] := acct[from] - amnt;
+ \* remember to retranslate the file!
We can rerun the model checker and see that the error no longer occurs. If we want,
we can set NumTransfers to 6 or add another three people, and TLA+ will seamlessly
check our larger problem. This is what makes formal specification so useful for
complex systems!
10.3.2 Liveness
If you look at the translation, you’d see this extra property PlusCal generated:
ProcSet is the set of all wires (so 1..2). pc tracks the current step of each process:
pc[1] = "Deposit" means which process 1 is ready to deposit. The whole quantifier is
then “every process is at the ‘Done’ step.”
What about the <>?
Remember how [] was always, and meant all t in Time? <>, or eventually, instead
means some t in Time.
Termination =
some t in Time:
all self \in ProcSet:
pc[self][t] = "Done"
<>P means that P doesn’t need to be true at the start, but it needs to eventually be-
come true in all possible timelines. This gets to one of the most powerful features of
TLA+. Our invariant was a kind of safety property: a promise that something “bad”
doesn’t happen. The other half of the coin is the liveness property: something “good”
is guaranteed to happen. Like, for example, our processes eventually finish pro-
cessing.
We can check Termination with tlacli check wire.tla --prop Termination. Surprisingly,
it fails:
10.4. SPECIFICATION IN THE WILD 105
State 4:
/\ acct = [alice |-> 0, bob |-> 1]
/\ amnt = <<1, 1>>
/\ to = <<"alice", "alice">>
/\ from = <<"alice", "alice">>
/\ pc = <<"Deposit", "Done">>
State 5: Stuttering
“Stuttering” is TLA+-speak for “crashes”. The first wire is almost finished, it just has
to complete “Deposit”, but crashes just before. Bob never gets his money.
By default, TLA+ assumes any process can crash at any step. [[It’s better to assume
maximum perversity and force users to make their assumptions explicit.]] If we
want to say the process doesn’t crash, we have to make it “fair”:
Not all liveness bugs are solved so easily. Often, fixing a liveness bug requires re-
thinking the fundamental design. Better to do that rethinking while we’re still in the
design phase, as opposed to after we released the product.
The past two chapters covered two different formal specification languages: TLA+
and Alloy. When people learn about these kinds of tools, they generally have two
questions:
1. Is this actually used in the real world?
2. How do I make sure my code matches the specification?
106 CHAPTER 10. SYSTEM MODELING
Question one is easy to answer: there are a lot of high-profile case studies of formal
specification saving everyday companies a lot of time and money. I’ve put some
examples in the “Further Reading” section.
Question two is harder. As we’ve seen in the functional correctness (page 41) chapter,
formal verification of code (page 58) is hard. Code needs to worry about a lot more
things than specifications do. Our transfer model abstracted away everything from
the specific packages we use to the “insufficient funds” dialog we show to users.
That level of abstraction is what makes specification so powerful in the first place;
verifying code loses that power.
(There are some specification languages that can “refine” spec into code, such
as Event-B33 . These tend to be significantly more difficult and expensive to use,
though.)
But the field of formal specification is young and we’re starting to see some interest-
ing developments. The most exciting innovation, in my opinion, is using a formal
spec to generate tests. You can see ths in the paper eXtreme Modeling in Practice34 ,
where they used a TLA+ specification to generate a test suite for a C++ library.
10.5 Summary
1. TLA+ is a form of logic used to model software systems and express their prop-
erties.
2. You can model check a TLA+ specificaton to find timelines which break proper-
ties.
3. We can check that a property is true for all states in all timelines, or at least
one state in each timeline.
The last two chapters covered two uses of and two tools for formal specification.
But this is just the tip of the iceberg: specification is a rich field with all sorts of
interesting languages and applications. I’ve worked with specification languages
for modeling probability, robotics, system dynamics models, and even corporate
bureaucracies!
Formal specification and formal verification together for formal methods, the disci-
pline of directly applying math to write code. I’ve found formal specification lan-
guages (on abstract models) more useful to industry than formal verification lan-
guages (on actual code), mostly because it’s easier to learn and a lot cheaper to in-
corporate into a regular development workflow.
33 https://www.event-b.org/
34 https://arxiv.org/abs/2006.00915
10.5. SUMMARY 107
In the next chapter, we will leave formal methods behind and focus on a different
class of practical problems elegantly solvable with logic.
TLA+:
• Learn TLA+35
• Specifying Systems36 is the canonical textbook.
Case Studies:
• Finding bugs without running or even looking at code37 [video]
• Use of Formal Methods at Amazon Web Services38
35 https://www.learntla.com
36 https://lamport.azurewebsites.net/tla/book.html
37 https://www.youtube.com/watch?v=FvNRlE4E9QQ
38 https://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf
Chapter 11
Solvers
Let’s say we have a set of T tests, and each takes a different amount of time to run.
T1 might take 1 second, T2 7 seconds, etc. We can divide the tests among N identi-
cal servers. We want to distribute them to minimize the overall testing time. If two
servers take 1 second to run all of its tests and a third takes 27 seconds, the overall
testing time is 27.
Fig. 11.1: Two different assignments with different overall test times.
To make the problem more interesting, some tests may belong to a group, and we
want all tests in the same group to run on the same server.
This kind of problem is difficult to solve in a normal programming language, but it’s
very easy to express logically, and lets us use a new class of tools to solve them.
11.1 Logic
Normally, we use logic because it is more expressive than the average programming
language. But this time we’ll do things a little differently and write our problem in a
less expressive way. Trust me, I have a reason for doing things this way.
First, how can we represent the test times? With a test_times array: test_times[2] is
the time the second test takes. Unlike everything else in this book, this is 1-indexed.
Next, how can we represent the server assignments? If we say our servers are rep-
resented by integers (just like our tests) we can also represent the server with an
108
11.1. LOGIC 109
Note
This is a different meaning of “constraint” than the “constraints” in the
database chapter. Different disciplines, different etymologies. If I was logi-
cally analyzing both databases and this class of problems at the same time, I’d
probably call the database constraints “rules” or something.
constraint GroupsHaveSameAssignment =
all t1, t2: 1..=T:
(group[t1] == group[t2] &&
group[t1] != 0)
=> assignment[t1] == assignment[t2]
(Exercise for the student: why don’t we need to check group[t2] != 0?)
The total time a server takes is the sum of all tests assigned to it.
total_time(s: 1..=S) =
sum({
test_times[t]
for t in 1..T:
assigment[t] == s
})
Now that we have the problem logically represented, how do we solve it?
By using a solver, of course!
A solver is a special tool that finds answers to problems like ours. There are all sorts
of solvers, some for special problems, some more general purpose.
We will use Minizinc39 , purely because you can try it free online40 . You can also
download it and run it locally, which will be faster on most machines. Here is the
solution in MiniZinc:
output [
"Server \(s): \([t | t in Tests where assignment[t] = s])"
++ " (\(sum([test_times[t] | t in Tests
where assignment[t] = s])))\n"
(continues on next page)
39 https://www.minizinc.org/
40 https://play.minizinc.dev
11.2. IN PRACTICE: SOLVERS 111
As with the other tools we have seen, MiniZinc uses its own syntax for logical expres-
sions that is different from ours. I picked arbitrary values for test_times and group;
MiniZinc can also read parameters in from a data file. We have to mark assignment
as a var so MiniZinc knows it can control that.
Running this will output progressively better results, until it finds a minimum time
of 26. If MiniZinc can’t find any valid solution it will output UNSATISFIABLE.
Solvers make it easy to add more interesting constraints. For example, we can add
a constraint that each server must have about the same number of tests:
The above is known as a “bin-packing problem”, which is one of the most popular
use-cases for these solvers. Finding the optimium solution doesn’t matter too much
with ten tests, but a more realistic workload might be 10,000 tests! That’s when a
solver can really save a business money.
Our use of MiniZinc explains why we had to store our parameters in such an inex-
pressive format. MiniZinc does not support strings, structures, arrays of arrays, or
any of the affordances we’re used to in programming languages. Even with its re-
strictions, MiniZinc is still more expressive than many other solvers. This because
solvers need to be fast. The more restrictions we place on the variables and con-
straints in our problems, the more specialized our solver can be, and in turn the
faster we can solve expressible problems.
For example, a linear programming solver has only numbers for values, and can only
compute expressions of the form a*x1 + b*x2 + .... But dedicated linear solvers can
run much faster than a general MiniZinc problem.
[[So why not convert a MiniZinc problem into a linear programming one, if possible?
In fact, that’s exactly what it does. MiniZinc is a high-level, tool-agnostic language
for expressing constraint problems, which it tries to “compile” into simpler forms.]]
There’s a dizzying array of subclasses of problems and solvers, but two are of partic-
ular interest, showing what can exist on the very ends of the speed/expressiveness
spectrum.
112 CHAPTER 11. SOLVERS
Note
Some classes of problems are valuable enough for specialist tools. Google
OR-tools41 is one of the most popular solvers available, and has specialized
solvers for problems like scheduling, vehicle routing, and bin packing.
If all we cared about was speed, what is the most stripped down, barebones con-
straint language we can make?
We get it by removing everything. No strings, arrays, no functions, no numbers. Ev-
ery variable is “true” or “false”. If we want to see whether test 7 is to server 3, we
make a boolean variable tracking that. a73 is true when test 7 is assigned to server
3, !a73 when it isn’t. But then we’ll also need a72, a71, a63…
Then we’ll need a constraint saying “each test is assigned to at least one server.”
What’s the simplest possible way to write that constraint? Probably something like
this:
But now our solver has to understand both “AND of ORs” and “OR of ANDs”. That’s
too much expressivity! Better find a way to rewrite that in CNF:
41 https://developers.google.com/optimization
11.2. IN PRACTICE: SOLVERS 113
Exercise 47
Explain why this means test 1 can’t be assigned to two different servers.
Solution (page 142)
Solvers for problems of this form are called SAT solvers, after “Boolean SATisfiabil-
ity”. The syntax may seem constrained, but a surprisingly large number of prob-
lems can be transformed into SAT problems. And in return, SAT solvers are some
of the fastest solvers in the world, routinely handling problems gigabytes in size.
Note
Maybe an exercise on converting the tag rule to SAT. We don’t need to actually
make the tags parameters, just replace them with clauses.
Note
The set of all problems solvable by a SAT solver is the “NP complexity class”, of
P vs NP fame. NP is considered “intractable”, meaning we do not have efficient
algorithms to solve all of the problems in the class. Most SAT problems seen in
the wild are “well-behaved” and can be solved quickly. But you can also find
small SAT problems that can’t be solved in a human lifetime.
For this reason, many people use tools that take expressive forms and convert them
into SAT. You can think of SAT as a low level “assembly” language that other tools
“compile” to. Alloy (page 87) uses a SAT solver internally.
11.2.3 SMT
On the other end of expressivity, we have SMT, or Satisfiability Modulo Theories solver.
While other solvers target a restricted category of math problem, SMT solvers are
flexible and handle a wide range of different problems. As just one example, we can
use it to reverse engineer a random number generator (RNG).
One old type of random number generator is the Linear Congruential Generator, or
LCG. Starting with a seed value x_0, each next value is determined by x_n+1 = (a*x_n
+ c) % m, where (a, c, m) are all fixed values. Given a sequence and m, can we recover
(a, c)? The most popular SMT in use is Z342 .
42 https://microsoft.github.io/z3guide/
114 CHAPTER 11. SOLVERS
a = Int('a')
c = Int('c')
if solver.check() == sat:
model = solver.model()
print(f"a = {model[a].as_long()}")
print(f"c = {model[c].as_long()}")
else:
print("Could not find parameters")
SMT are more expressive than even “generic” constraint solvers, but that expres-
siveness comes at a price of completeness. All of the prior constraint problems we
looked at were decidable, meaning the solver will either definitely return a value or
definitely tell us there is no solution. SMT solvers can also return “UNKNOWN”,
meaning the solver couldn’t figure if the problem is even solvable or not. Over time,
SMT solvers are getting better and better at finding solutions, but they will never be
able to solve all problems. Such is the price of expressiveness.
Nonetheless, SMT solvers are incredibly popular for their flexibility and see all sorts
of different use cases. They can crack cryptographic primitives43 , reverse engineer
compiled binaries44 , find differences in firewall rulesets45 , and synthesize code
43 https://github.com/kste/cryptosmt
44 https://docs.angr.io/en/latest/core-concepts/solver.html
45 https://github.com/Z3Prover/FirewallChecker
11.3. WHICH TO USE? 115
from specifications46 . They are also the engine that powers most work in theorem
proving and formal verification (page 58). Dafny (page 58) uses an SMT solver to verify
code, and one research model checker for TLA+ (page 100) does too.
Note
How can a solver (which finds if solutions exist) power a verifier (which checks
that a property always holds)? Easy, just exploit quantifier duality (page 19).
If the property is all x: P(x), then ask the solver to satisfy !P(x). If the solver
can’t find any solutions, then !(some x: !P(x)) holds, which is equivalent to our
property.
So given all of the options, which solver is the right one to pick? This depends on a
lot of factors, but I can provide some very general heuristics.
First of all, most programmers are unlikely to directly use SAT. SAT is fast and pow-
erful but it takes a lot of skill to represent problems in an optimal way, and to inter-
pret a SAT solution to a problem. For these reasons, the main users of SAT solvers
are people who 1) absolutely need the maximum possible performance on their
problem, or 2) are building higher-level tools. We are more likely to use a tool that
uses a SAT solver as part of its implementation.
After that, the right solver depends on the nature of our goal: is it satisfaction or op-
timization? SMT solvers are the right tool when the problem has very few solutions
and any one will do. Other solvers are the right tool when the problem has many
valid solutions, but some are more optimal than others. As a rough rule, most “satis-
faction” problems are technical/”software engineering”-oriented, while most opti-
mization problems are business/”operations research”-oriented: shift scheduling,
vehicle routing, manufacturing optimization, resource allocation, etc.
If SMT is the right choice, then we default to Z3. It is most widely supported, has
integrations with the most languages, and has the most thorough documentation of
any SMT solver.
If an optimization solver is the right choice, the general approach is to find the
least expressive class of tools that totally expresses the problem, as that will be the
fastest. [[ILP, LP, Simplex, MIP, etc]]. There’s a dizzying plethora of different tools
and classes. A decent enough starting point is MiniZinc and Google’s OR-Tools47 .
46 https://cseweb.ucsd.edu/~npolikarpova/publications/popl20.pdf
47 https://developers.google.com/optimization
116 CHAPTER 11. SOLVERS
11.4 Summary
48 http://www.hakank.org/common_cp_models/
49 http://www.hakank.org/minizinc/
Chapter 12
Logic Programming
In a book called “Logic for Programmers” I’ve somehow managed to not bring up
“logic programming” for ninety pages. I feel like I deserve a medal.
Logic programming (LP) is a distinct paradigm of programming, like imperative and
functional are. The most famous logic programming language is called “Prolog”.
Prolog was first created in 1970s and since then has split off into many different
variants. We will use the “SWI-Prolog” variant, which you can try online at https:
//swish.swi-prolog.org/. By necessity, this chapter will be even more of a broad
overview than the other chapters.
12.1 Prolog
This defines the fact ingredient() and determines it’s true if the first parameter is
bread and the second is flour or water. Now if I call ingredient with a variable, I am
asking the Prolog engine to find a value that makes my expression true.
ingredient(X, flour).
X = bread % result
ingredient(X, potatoes).
false % no possible X
This maps directly to the some quantifier: ingredient(X, flour) is true if some x:
ingredient(x, flour). If multiple values satisfy the expression, then Prolog will return
117
118 CHAPTER 12. LOGIC PROGRAMMING
parent(a0, a1).
parent(a1, a2).
parent(a2, a3).
parent(a3, a4).
parent(a4, a5).
parent(a1, b1).
parent(b1, b2).
parent(b2, a4).
parent(b2, b3).
Once we have a collection of facts, we can then add “rules”, or predicates with com-
plex bodies. For example, a “merge commit” is one that has two different parents.
mergecommit(C) :-
parent(P1, C),
parent(P2, C),
\+ (P1 = P2). % \+ is 'not'
?- mergecommit(C).
C = a4.
Rules can have multiple definitions, in which case the predicate is true if any rule is
true. This makes it easy to express recursive statements, like “A is the ancestor of C
if it is a parent of C or the parent of an ancestor of C.”
12.2. DEDUCTIVE DATABASES 119
With this, we can express complex queries, like “ancestors of commit A that are not
ancestors of commit B”:
% \+ is "not"
?- ancestor(a5, X), \+ ancestor(b3, X).
X = a4 ;
X = a3 ;
X = a2
Note
Prolog uses a “backtracking” algorithm to find solutions. As such, it does not
guarantee that all solutions are unique.
% commit_author_file
caf(C, A, F) :-
commit(C, A, Files),
member(F, Files).
file(f1) is a prolog compound term, equivalent to a struct or product type in other lan-
guages. It can be used in [[pattern matching]]: caf(_, alice, testfile(X)) will retrieve
any testing file (but not regular file) that Alice modified.
[[bridge]]
[[In their paper Evidence Based Failure Prediction, Nagappan et al argue that patterns
in how we change files can predict the likelihood of bugs in those files. Files with
more commits, “churn”, are more likely to have latent bugs. Let’s implement two
rules that suggest a file is more likely to have bugs:]]
1. high_churn is true if a file was changed by at least three different commits with
different authors. In our model, this applies to just file(f1).
2. untested_commit is true if a file was changed in a commit, and its correspond-
ing test was not changed. This check should not apply to test files. In our
model, this applies to file(f1) and file(f3).
high_churn(File) :-
caf(_, A1, File), caf(_, A2, File), caf(_, A3, File),
A1 @< A2, A2 @< A3. % @< = ordering on atoms
untested_commit(file(File)) :-
commit(_, _, Files),
member(file(File), Files),
\+ member(testfile(File), Files).
We can write high_churn more elegantly (and not hardcode in the number of au-
thors), but that requires more sophisticated Prolog technique. From here, we can
map each file to the number of checks it fails.
checks_failed(File, Failed) :-
checks_failed(File, [high_churn, untested_commit], Failed).
If we just want the count of the checks each file failed, we can write another helper
operator.
file_suspicion(File, Suspicion) :-
checks_failed(File, Failed),
length(Failed, Suspicion).
In practice, Prolog is rarely used as the query language for deductive databases
for two reasons: Prolog cannot be embedded in other languages, and Prolog
queries are not guaranteed to terminate. The main language is instead Datalog, a
“well-behaved” subset of Prolog without these issues. Datomic50 , for example, uses
datalog for queries, but embeds it as a DSL in Clojure.
12.4 Planning
There is one class of AI problems that (as of 2025) cannot be handled with statistical
approaches: planning. Given a starting state, a set of valid actions, and a goal state,
what sequence of actions should get us to the goal state?
Consider the following situation: we have a set of online servers that need two OS
updates. We can only upgrade a server that is offline, and we need to make sure
that we always have at least one server online. We can further abstract the servers
so that they consist only of a name, a boolean on/off state, and a numerical version.
In this case, the starting state for two servers would be the set {(s1, on, 1), (s2, on,
1), the goal state would be {(s1, on, 3), (s2, on, 3), and there would be two possible
50 https://www.datomic.com/
122 CHAPTER 12. LOGIC PROGRAMMING
actions:
• Toggle the state of a server, unless doing so would leave all servers off
• Increment the version of an off server.
Prolog does not natively support planning, but my personal favorite logic language,
Picat51 , does. Here is the planning problem in Picat:
final(N) =>
foreach($server(_, State, Version) in N)
State = on, Version = 3
end.
cost(State) = 1.
main =>
Start = [$server(s1, on, 1), $server(s2, on, 1)],
best_plan(Start, Plan, Cost),
writeln(Plan), writeln(Cost).
I coded the first output to be the list of steps that solves our problems. The second
output is the “cost” of the plan, which Picat will automatically minimize. In this case,
each action has cost 1, meaning to total cost is just the number of steps in our plan.
The planner is automatically able to find a sequence of steps that solves our prob-
lem. It can also minimize cost. In this case, the “cost” is just the number of steps,
leading to an eight-step solution. To showcase we can add an additional penalty for
having several online servers with different versions, equal to (MaxVersion - MinVer-
sion) cubed.
Our program looks very similar:
- cost(State) = 1.
+ cost(State) = Out =>
+ member($server(_, on, Vmin), State).minof(Vmin),
+ member($server(_, on, Vmax), State).maxof(Vmax),
+ Out = 1 + max(0, Vmax - Vmin)**3.
With these changes, the eight-step solution would have a total cost of 16. Picat, in-
stead, finds a longer solution with a smaller cost:
Planning is mostly used in AI research and especially in video game AIs, where it is
called Goal-Oriented Action Planning.
12.5 Summary
General Topics:
• Association for Logic Programming: https://logicprogramming.org/
• The Power of Prolog: https://www.metalevel.at/prolog
• Logic Programming Courseware: https://athena.ecs.csus.edu/~mei/logicp
Other Logic Programming Languages:
• miniKanren: https://minikanren.org
• Datalog: https://www.learndatalogtoday.org/
• Picat: http://picat-lang.org/ and http://picat-lang.org/picatbook2015.html
• Answer set programming: https://potassco.org/
LP Case Studies:
• IBM Watson used Prolog for natural language processing: https:
//www.cs.miami.edu/home/odelia/teaching/csc419_spring19/syllabus/
IBM_Watson_Prolog.pdf
• The JVM uses Prolog in the typechecker: https://docs.oracle.com/javase/
specs/jvms/se10/jvms10.pdf
• The Pubgrub package resolution algorithm uses Answer set programming:
https://github.com/pubgrub-rs/pubgrub
Appendix A
Math Notation
I used programmer symbols and my own syntax through this book; mathematicians
use different symbols. I did this because I wanted everything to be easily greppable
and inferable from context. If you’re seeing ∪ for the first time, it’s really hard to
look up what it means!
125
126 APPENDIX A. MATH NOTATION
Take the expression all x in set: P(x). Here are three different ways mathematicians
write it:
∀𝑥 ∈ 𝑠𝑒𝑡 : 𝑃 (𝑥)
∀𝑥.𝑠𝑒𝑡(𝑥) → 𝑃 (𝑥)
∀𝑥 : 𝑠𝑒𝑡|𝑃 (𝑥)
Some mathematicians write ∃!𝑥 to mean “there exists exactly one x”, but it’s not by
any means a universal convention.
A.3 Tautologies
I wrote the double negative rewrite rule as !!P = P. To be more mathematically pre-
cise, given !!P, we can prove P. Three ways you could write this:
¬¬𝑃
∴𝑃
¬¬𝑃 ⊢ 𝑃
¬¬𝑃 → 𝑃
Which one a mathematician uses can depend on the particular field they publish in.
In the last case, they will reserve → to mean “we can prove” and exclusively use ⇒
to mean “implies”.
Appendix B
Useful Rewrite Rules
B.1 Table of Tautologies
Some of these are =, to mean the two formulas are identical- you can substitute one
for the other. Some are =>, meaning they only go one way.
Propositional Logic:
P = !!P
P && !P = False
P || !P = True
De Morgan’s law:
!P && !Q = !(P || Q)
!P || !Q = !(P && Q)
B.1.1 Implication
Definition:
P => Q = !P || Q
!(P => Q) = P && !Q
Contrapositive:
P => Q = !Q => !P
Transitivity: (P => Q && Q => R) => (P => R) Note it’s not an =! It doesn’t go both ways!
127
128 APPENDIX B. USEFUL REWRITE RULES
B.1.2 Quantifiers
Extraction:
Duals:
Commutativity:
all x in S, y in S: ... =
all x, y in S: ...
some x in S, y in S: ... =
some x, y in S: ...
some x: all y can be replaced with all y: some x, which is stronger. You cannot go the
other way!
Distributivity:
Note
EVERYTHING in the section needs to be thoroughly checked against a mathe-
matician. Also it might be thrown out entirely if it’s too long and not helpful.
Under construction.
This entire book is about classical first order logic. That’s the logic that most math-
ematicians use to do math. But mathematics is flexible and mathematicians hate tak-
ing a system for granted. So many mathematicians have asked “what happens if we
make logic different?”
This is some of about those ways of making logic different.
For every set s, we can create a predicate S(x) = x in s. This means every set defines
a predicate. Is the opposite true: for every predicate, can we find a set of all things
that pass that predicate?
->
In naive set theory, this is true for all sets. Naive set theory has a problem, though, that
lead to mathematicians abandoning it over a century ago. Consider the predicate
Evil(x) = x not in x, aka x is not a set that contains itself. Most sets would pass this
predicate: {}, {1}, {[1, 3], abc}, etc. Some sets would fail this predicate, like “the set
of all non-empty sets”. If all predicates formed sets and vice versa, we’d have a set
evil, the set of all sets that don’t contain themselves.
Now is Evil(evil) true? If so, evil doesn’t contain itself, meaning it’s not in evil, so
Evil(evil) is false by definition, but then it’s not in evil, meaning it doesn’t contain
itself, meaning Evil(evil) is true…
This is called “Russell’s paradox” and is considered a reason that not all predicates
form sets. The paradox drove much of the development of formal logic in the early
129
130 APPENDIX C. BEYOND LOGIC
20th century in order to find ways to avoid the paradox. The most mainstream solu-
tion to this is called “ZF” Set Theory, but another is modern type theory, which has
found a home in modern functional pro
Most mathematicians prefer to stick with first-order logic because higher order logic
is too “powerful”. As one logician I interviewed put it, “you don’t want your logic
suddenly building a rocket ship.” A rough analogy would be to Turing completeness
in computer science. It’s harder to analyze a Turing complete language than a more
limited one.
[[Applications: type theory, theoretical computer science I think]]
At the very very beginning of the book, I said “all predicates return true or false”.
This is the Law of Excluded Middle: any statement is true or false. There is no “third
thing” a statement can be.
This leads to something unusual: if you want to prove something true, you have
the option of proving it “non-false”. And if you want to prove a set is nonempty,
you can instead prove that it’s impossible for the set to be empty. This is called a
“non-constructive proof”, since you aren’t actually constructing a value inside that
set.
My favorite example of this is Chess. Imagine we are playing a slight variation where,
instead of White always moving first, they can choose whether to move first or sec-
ond. With this extra rule, we can trivially prove that White has a foolproof strategy
to always win or tie:
Assume White doesn’t have a foolproof strategy. Then, assuming perfect play, Black
always wins. But then White can pass on their first turn, making them effectively the
C.4. MODAL LOGIC 131
second player, and then follow the winning strategy for Black. This means that Black
doesn’t have a winning strategy, which means White must have a foolproof one.
It’s an elegant and watertight proof. It also gives us zero information on what the
foolproof strategy actually is, and so chess remains stubbornly intractable.
Non-constructive proofs bother some mathematicians, who proposed an alternate
form of logic called Constructive Logic. Constructive logic doesn’t have the law of ex-
cluded middle, nor does it have double-negation: you can’t replace !!P with P. These
two removals mean that writing proofs and manipulating statements is harder. But
in return, it guarantees the only way to prove the existence of something is to actu-
ally find an example.
Another consequence of constructive logic is that without excluded middle, impli-
cation can work a little more like it does in normal language. The statement “if I was
named Greg, then I’d be king of England” is mathematical true in classical logic (I’m
not named Greg), but it’s not a true statement in day to day life. And it doesn’t have
to be in constructive logic, either.
Functional programming-style type systems are constructive by nature. A function
of type Int -> Bool is a “proof” that if integers exist, booleans do too, because there’s
a way to turn an example of an integer into an example of a boolean.
Answer to Exercise 2
CanRunProgram(c, p) =
Native(p) => (RAM(c, p) && CPU(c, p)) || GPU(c, p)
Answer to Exercise 3
1. (c => x) && (!c => y) is equivalent to (!c || x) && (c || y). If you work through the
cases, you should find that IfThenElse is true when c is true and x is true, or
when c is false and y is true.
2. As hinted by the name IfThenElse is simulating a conditional. We can also write
it like this:
132
133
Child & Adult == {}. Another way would be Child - Adult == Child && Adult - Child ==
Adult.
Answer to Exercise 8
If not a single developer has reviewed the pr, then EveryoneApproved is true (all zero
reviewers approved!) while SomeoneReviewed is false (nobody reviewed it).
In general, all x in {}: P(x) is always true (regardless of what P is) and some x in {}: P(x)
is always false.
Answer to Exercise 9
Answer to Exercise 11
Answer to Exercise 12
IsDivisibleBy(num, divisor) =
some x in 1..=num:
x*divisor = num
134 APPENDIX D. ANSWERS TO EXERCISES
Answer to Exercise 14
some x, y: x != y && P
Answer to Exercise 15
all x, y, z:
(1. x != y
2. y != z
3. z != x
) => P(x, y, z)
Answer to Exercise 16
all x: P(x)
Answer to Exercise 17
Answer to Exercise 18
First rewrite it was !P || Q. Then replace Q with !(!Q) to get !(!Q) || !P. Then rewrite that
as !Q => !P.
135
Python and Haskell use all() and any(). Javascript uses Array.every() and Array.some().
C++ has std::all_of() and std::any_of, and also has std::none_of.
Answer to Exercise 22
In all x in set: P(x) && some x in set: P(x), the only thing that the some is doing is
checking that the set is nonempty, as that’s the only case where all x can be true and
some x can be false. So we can rewrite the code to not have that:
Answer to Exercise 23
x <= 1 || x > 10
In all x in set: P(x) => Q(x) we only check Q(x) on the elements that also satisfy P(x).
In all x in {x \in set: P(x)}: Q(x) we filter out all of the elements that don’t satisfy P(x)
and then check Q(x) on the rest. Equivalence follows.
Recall that P tests that the max of [1,2,3] is 3, while R tests that max values of [1, 2,
3] and [0, 1, -1] are >= 0.
1. To pass R and not P, write a max(l) = 1 . To pass P and not R, write a max that just
returns the last value of the input. 2.
136 APPENDIX D. ANSWERS TO EXERCISES
T =
1. max([1, 2, 3]) == 3
2. max([0, 1, -1]) == 1
test false will reject any buggy implementation of max… but it will also reject a cor-
rect implementation! What makes a given test “a test of max” is that it will pass
for a correct implementation, meaning false isn’t a test of max at all (and cannot be
stronger than them).
By contrast, test true is a valid test of max, and in fact the weakest possible test.
IsUnique(l):
all x, y in 0..<len(l):
x != y => l[x] != l[y]
Or, using disj, we could write all disj x, y instead and skip the condition.
In python:
@given(s.lists(s.integers()), s.integers())
def test_myfind(l, x):
out = myfind(l, x)
if out == -1:
assert x not in l
else:
assert l[out] == x
assert x not in l[0:out]
Note that this will statistically overtest the case where x is not in l. Part of learning to
use PBT well is getting a sense of how to best generate inputs. The techniques here
are beyond the scope of this book.
137
In the new version of max_avail_price it no longer requires “there is at least one avail-
able item”. If there are no available items, we never call max anyway, so don’t violate
its requirements.
On the other hand, max_avail_price’s postconditions get more complicated. If the re-
turn value is a number, it still is the highest price of an available item. If the return
value is None, then there were no available items. So the new contract is this:
max_avail_price(items) returns o
helpers:
available = `list of available items in items`
requires:
NOTHING AT ALL
ensures:
o == None => all i in Item: !i.available
o != None =>
`output is priciest available item`:
some i in available:
1. i.price = out
2. all i2 in available: i2.price <= i.price
# requires: a != 0
# requires: b^2 >= 4ac
def quadratic(a, b, c):
lhs = -b / (2*a)
3. There’s one thing I left out of the functional specification in part (1): sqrt guar-
antees the output is nonnegative, too.
138 APPENDIX D. ANSWERS TO EXERCISES
x = 5
# requires (a): x >= 0
y = sqrt(x)
# ensures (b): y >= 0
# requires (b): y >= 0
z = sqrt(y)
# ensures: width == x
# ensures: length == old(length)
Rectangle.setWidth(x)
In other words, setWidth ensures it only changes the width; it does not change the
length. Square.setWidth doesn’t have stronger postconditions, they do not imply
Rectangle.setWidthPost. This problem goes away if we make the classes immutable.
In general, implementing a mutable abstraction is harder than implementing a mu-
table one.
It follows from the preconditions: x >= 0 and y > 0 means that x / y > 0, and floor can’t
make a positive number into a negative one. Since q == floor(x / y), q >= 0.
The table has two rows, but is unsound (two contradictory inputs) and incomplete
(missing an input). This means it is invalid.
x % 3 == 0? x % 5 == 0? out
T T “fizzbuzz”
T F “fizz”
F T “buzz”
F F x
Answer to Exercise 35
Z W H M
T - - T
F webcam - F
F desk T T
F desk F F
First of all, if S has s elements and T has t elements, #S * #T == s*t. Since the cardinal-
ity of the set does not depend on what elements it has, only the number of elements,
I can safely assume that S = 1..=s and T = 1..=t. Now, I just need to show that (1..=s)
x (1..=t) has s*t elements. To do this, I will put all of the elements in a grid:
There’s two ways we can do this. The first is to write it all as one predicate:
140 APPENDIX D. ANSWERS TO EXERCISES
constraint
all disj ug1, ug2 in user_groups:
ug1.user_id != ug2.user_id ||
ug1.group_id != ug2.group_id
But the way I’d prefer to do it instead would be to first write a helper operator, and
then use that in the actual constraint.
constraint
all disj ug1, ug2 in user_groups:
!SameUserAndGroup(ug1, ug2)
Answer to Exercise 38
constraint UsersInGroupsHaveEmail =
all ug in user_groups, u in users:
ug.user_id = u.id => u.email != NULL
Answer to Exercise 39
MemberOf(u, g) =
some ug in user_groups:
1. ug.user_id = u.id
2. ug.group_id = g.id
constraint MaxFiveMembers =
all g in groups:
#{u in User: MemberOf(u, g)} <= 5
(We have to wrap from in braces because you can’t union a set and a string, only a
set and another set.)
Answer to Exercise 41
141
At every step, Alice must transfer at least one dollar to Bob. Eventually there is some
t where alice[t] == 0 && bob[t] == 20. Then Alice can’t make a transfer, Transfer(t) is
false, and so Spec is false.
TransferBobToAlice(t: Time) =
some value in 1..=bob[t]:
1. alice[t+1] == alice[t] - value
2. bob[t+1] == bob[t] + value
Next(t) =
|| TransferAliceToBob(t)
|| TransferBobToAlice(t)
Now, can Alice and Bob transfer to each other in the same step? No. Let’s say they
both start with 10 dollars and each try to transfer five dollars to each other. By Trans-
ferAliceToBob we have:
1. alice[1] == alice[0] - 5 == 5
2. bob[1] == bob[0] + 5 == 15
1. bob[1] == bob[0] - 5 == 5
2. alice[1] == alice[0] + 5 == 15
Stutter =
1. alice' == alice
2. bob' == bob
142 APPENDIX D. ANSWERS TO EXERCISES
1. ``[](all x: P(x))``
2. ``all t in Time: all x: P(t, x)`` (definition of ``always``)
3. ``all x: all t in Time: P(t, x)`` (commutivity of ``all``)
4. ``all x: []P(x)`` (definition of ``always``)
Answer to Exercise 47
There’s a couple of ways to show this. The first is to say that if test 1 is assigned to
server N, for any other server M we have the clause (!a1N || !a1M). Since we already
have a1N, the only way for the clause to be true is if !a1M, ie test 1 isn’t assigned to
M.
Another way to see this is to rewrite (!a11 || !a12) && (!a11 || !a13) as a11 => !a12 &&
a11 => !a13 ..., which is equivalent to a11 => !a12 && !a13 && ....
Index
Non-alphabetical I
' (prime), 69 in (set), 10
- (set), 11 incomplete, see complete
- (table), see any
& (set), 11 L
&& (and), 6 liveness, 104
=> (implies), 8 Logic programming, 117
`backticks` (in predicates), 5 loop invariant, 55
| (set), 11
|| (or), 6 M
Metamorphic properties, 39
A Minizinc, 110
action, 99 MISU, 47
all, 14
Alloy, 87 P
any (table), 64 planning, 121
assert, 41 PlusCal, 101
assertion, 41 postcondition, 42
power set, 11
B precondition, 42
behavior, 95 predicate, 5
proof, 53
C Property-Based Testing, 36
cartesian product, 76
complete, 65 Q
Conjunctive Normal Form, 112 quantifier, 12, 13
contract, 43 scoped quantifier, 13
D R
Dafny, 58 refinement, 90
datalog, 119 Relational Model, 73
Decision Table, 63 requires, 42
deductive database, 119 rewrite rules, 18
disj, 18
Domain of Discourse, 10 S
safety, 104
E set, 10
ensures, 42 set filter, 12
set map, 12
F SMT, 113
Formal Specification, 86 some, 13
Formal verification, 58 sound, 65
fuzzing, 38 specification
143
144 INDEX
total, 34
subset, 11
T
Temporal Logic, 98
Theorem, 20
TLA+, 100
truth table, 7
U
unsound, see sound
X
x (sets), see cartesian product