Chapter Notes
Chapter Notes
James Aspnes
2020-10-24 14:59
CONTENTS viii
12 Derandomization 228
12.1 Deterministic vs. randomized algorithms . . . . . . . . . . . . 229
12.2 Adleman’s theorem . . . . . . . . . . . . . . . . . . . . . . . . 230
12.3 Limited independence . . . . . . . . . . . . . . . . . . . . . . 231
12.3.1 MAX CUT . . . . . . . . . . . . . . . . . . . . . . . . 231
12.4 The method of conditional probabilities . . . . . . . . . . . . 232
12.4.1 A trivial example . . . . . . . . . . . . . . . . . . . . . 233
12.4.2 Deterministic construction of Ramsey graphs . . . . . 233
12.4.3 MAX CUT using conditional probabilities . . . . . . . 234
12.4.4 Set balancing . . . . . . . . . . . . . . . . . . . . . . . 235
Quantum computing
236
CHAPTER 13. QUANTUM COMPUTING 237
Any such state vector x for the system must consist of non-negative
real values that sum to 1; this is just the usual requirement for a discrete
probability space. Operations on the state consist of taking the old values of
one or both bits and replacing them with new values, possibly involving both
randomness and dependence on the old values. The law of total probability
applies here, so we can calculate the new state vector x0 from the old state
vector x by the rule
These are linear functions of the previous state vector, so we can summarize
the effect of our operation using a transition matrix A, where x0 = Ax.1
We imagine that these operations are carried out by feeding the initial
state into some circuit that generates the new state. This justifies the calling
this model of computation a random circuit. But the actual implementa-
tion might be an ordinary computer than is just flipping coins. If we can
interpret each step of the computation as applying a transition matrix, the
actual implementation doesn’t matter.
For example, if we negate the second bit 2/3 of the time while leaving
the first bit alone, we get the matrix
1/3 2/3 0 0
2/3 1/3 0 0
A= .
0 0 1/3 2/3
0 0 2/3 1/3
One way to derive this matrix other than computing each entry directly is
that it is the tensor product of the matrices that represent the operations
on each individual bit. The idea here is that the tensor product of A and B,
written A ⊗ B, is the matrix C with Cij,k` = Aik Bj` . We’re cheating a little
bit by allowing the C matrix to have indices consisting of pairs of indices,
one for each of A and B; there are more formal definitions that justify this
at the cost of being harder to understand.
1
Note that this is the reverse of the convention we adopted for Markov chains in
Chapter 9. There it was convenient to have Pij = pij = Pr [Xt+1 = j | Xt = i]. Here we
defer to the physicists and make the update operator come in front of its argument, like
any other function.
CHAPTER 13. QUANTUM COMPUTING 238
0 0 2/3 1/3
The first matrix in the tensor product gives the update rule for the first bit
(the identity matrix—do nothing), while the second gives the update rule for
the second.
Some operations are not decomposable in this way. If we swap the values
of the two bits, we get the matrix
1 0 0 0
0 0 1 0
S= ,
0 1 0 0
0 0 0 1
terms:
xout = ASAxin
1/3 2/3 0 0 1 0 0 0 1/3 2/3 0 0 1
2/3 1/3 0 0
0 0 1 2/3 1/3
0 0 0
0
=
0 0 1/3 2/3 0 1 0 0 0 0 1/3 2/3 0
The typographic trick is to split in half both hx|yi and its expansion. For
CHAPTER 13. QUANTUM COMPUTING 240
0 0 0 0 0
CHAPTER 13. QUANTUM COMPUTING 241
maps |10i to |01i (Proof: |01i h10| |10i = |01i h10|10i = |01i) and sends all
other states to 0. Add up four of these mappings to get
1 0 0 0
0 0 1 0
S = |00i h00| + |10i h01| + |01i h10| + |11i h11| = .
0 1 0 0
0 0 0 1
Here the bra-ket notation both labels what we are doing and saves writing a
lot of zeros.
The intuition is that just like a ket represents a state, a bra represents a
test for being in that state. So something like |01i h10| tests if we are in the
10 state, and if so, sends us to the 01 state.
p0 |0i + p1 |1i ,
a0 |0i + a1 |1i ,
where a0 and a1 are complex numbers with |a0 |2 + |a1 |2 = 1.2 The reason for
this restriction on amplitudes is that if we measure a qubit, we will see state 0
with probability |a0 |2 and state 1 with probability |a1 |2 . Unlike with random
bits, these probabilities are not mere expressions of our ignorance but arise
2
√ The absolute value, norm, or magnitude |a + bi| of a complex number is given by
a2 + b2 . When b = 0, this is the same as the absolute value for the corresponding
√ real
number. For any complex number x, the norm p as x̄x, where
p can also be written √ x̄ is
the complex conjugate of x. This is because (a + bi)(a − bi) = a2 − (bi)2 = a2 + b2 .
The appearance of the complex conjugate here explains why we define hx|yi = x∗ y; the
conjugate transpose means that for hx|xi, when we multiply x∗i by xi we are computing a
squared norm.
CHAPTER 13. QUANTUM COMPUTING 242
1/2
way to state this is that the columns of A form an orthonormal basis: this
means that hAi |Aj i = 0 if i 6= j and 1 if i = j, which is just a more verbose
way to say A∗ A = I. The same thing also works if we consider rows instead
of columns.
The rule then is: at each step, we can operate on some constant number
of qubits by applying a unitary transformation to them. In principle, this
could be any unitary transformation, but some particular transformations
show up often in actual quantum algorithms.4
The simplest unitary transformations are permutations on states (like
the operator that swaps two qubits), and rotations of a single state. One
particularly important rotation is the Hadamard operator
r " #
1 1 1
H= .
2 1 −1
q q
This maps |0i to the superposition 12 |0i + 12 |1i; since this superposition
collapses to either |0i or |1i with probability 1/2, the state resulting from
H |0i is q
the quantum-computing
q equivalent of a fair coin-flip. Note that
H |1i = 12 |0i − 12 |1i 6= H |0i. Even though both yield the same prob-
abilities; these two superpositions have different phases and may behave
differently when operated on further. That H |0i and H |1i are different is
necessary, and indeed a similar outcome occurs for any quantum operation:
all quantum operations are reversible, because any unitary matrix U has
an inverse U ∗ .
If we apply H in parallel to all the qubits in our system, we get the n-fold
tensor product H ⊗n , which (if we take our bit-vector indices
q as integers
−1
0 . . . N − 1 = 2n − 1 represented in binary) maps |0i to N1 N i=0 |ii. So
P
0 0 1 0
which is clearly unitary (the rows are just the standard basis vectors). We
could also write this more compactly as |00i h00| + |01i h01| + |11i h10| +
|10i h11|.
The CNOT operator gives us XOR, but for more destructive operations
we need to use more qubits, possibly including junk qubits that we won’t look
at again but that are necessary to preserve reversibility. The Toffoli gate or
controlled controlled NOT gate (CCNOT) is a 3-qubit gate that was
originally designed to show that classical computation could be performed
reversibly [Tof80]. It implements the mapping (x, y, z) 7→ (x, y, (x ∧ y) ⊕ z),
which corresponds to the 8 × 8 matrix
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
.
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
By throwing in some extra qubits we don’t care about, Toffoli gates
can implement basic operations like NAND ((x, y, 1) 7→ (x, y, ¬(x ∧ y))),
NOT ((x, 1, 1) 7→ (x, 1, ¬x)), and fan-out ((x, 1, 0) 7→ (x, 1, x)).5 This gives
5
In the case of fan-out, this only works with perfect accuracy for classical bits and not
superpositions, which run into something called the no-cloning theorem. For example,
applying CCNOT to √12 |010i + √12 |110i yields √12 |010i + √12 |111i. This works, sort of,
but the problem is that the first and last bits are still entangled, meaning we can’t operate
on them independently. This is actually not all that different from what happens in the
probabilistic case (if I make a copy of a random variable X, it’s correlated with the original
X), but it has good or bad consequences depending on whether you want to prevent people
from stealing your information undetected or run two copies of a quantum circuit on
independent replicas of a single superposition.
CHAPTER 13. QUANTUM COMPUTING 245
2. We can represent f (x) by changing the phase of |xi: |xi 7→ (−1)f (x) |xi.
The actual unitary operator corresponding to this is x (−1)f (x) |xi hx|,
P
0 0 0 1
for the XOR function.
This has the advantage of requiring one fewer qubit, but we can’t
observe the value of f (x) directly, because the amplitude of |xi is
unchanged. Where this comes in handy is when we can use the change in
phase in the context of a larger quantum algorithm to get cancellations
for some values of x.
The first representation makes more sense for modeling classical circuits.
The second turns out to be more useful when f is a subroutine in a larger
quantum algorithm. Fortunately, the operator with matrix
" #
1 0
0 −1
Our goal is for this final state to tell us what we want to know, with
reasonably high probability.
This puts all the weight on |0i, so when we take our measurement at the
end, we’ll see 0.
Alternatively, if f (0) = b 6= f (1), it’s the |0i terms that cancel out,
leaving (−1)b |1i. Again the phase depends on b, but we don’t care about the
phase: the important thing is that if we measure the qubit, we always see 1.
The result in either case is that with probability 1, we determine the
value of f (0) ⊕ f (1), after evaluating f once (albeit on a superposition of
quantum states).
This is kind of a silly example, because the huge costs involved in building
our quantum computer probably swamp the factor-of-2 improvement we got
in the number of calls to f . But a generalization of this trick, known as the
Deutsch-Josza algorithm [DJ92], solves the much harder (although still a bit
contrived-looking) problem of distinguishing a constant Boolean function on
n bits from a function that outputs one for exactly half of its inputs. No
deterministic algorithm can solve this problem without computing at least
2n /2 + 1 values of f , giving an exponential speed-up.7
Making this work requires showing that (a) we can generate the original
superposition |si, (b) we can implement D efficiently using unitary operations
on a constant number of qubits each, and (c) we actually get w at the end
of this process.
Here we use the fact that |si hs| |si hs| = |si hs|si hs| = |si (1) hs| = |si hs|.
Recall that |si = H ⊗n |0n i, where H ⊗n is the result of applying H to
each of the n bits individually. We also have that H ∗ = H and HH ∗ = I,
from which H ⊗n H ⊗n = I as well.
So we can expand
D = 2 |si hs| − I
= 2H ⊗n |0n i (H ⊗n |0n i)∗ − I
= 2H ⊗n |0n i h0n | H ⊗n − I
= H ⊗n (2 |0n i h0n | − I) H ⊗n .
D = 2 |si hs| − I
= 2 ((sin θ) |wi + (cos θ) |ui) ((sin θ) hw| + (cos θ) hu|) − I
= (2 sin2 θ − 1) |wi hw| + (2 sin θ cos θ) |wi hu| + (2 sin θ cos θ) |ui hw| + (2 cos2 θ − 1) |ui hu|
= (− cos 2θ) |wi hw| + (sin 2θ) |wi hu| + (sin 2θ) |ui hw| + (cos 2θ) |ui hu|
" #
− cos 2θ sin 2θ
= ,
sin 2θ cos 2θ
Ideally, we pick t so that (2t + 1)θ = π/2, which would put all of the
amplitude on |wi. Because t is an integer, we can’t do this exactly, but
setting t = b π/2θ−1
2 c will get us somewhere between π/2 − 2θ and π/2. Since
q
1
θ≈ N,this gives us a probability of seeing |wi in our final measurement
p √
of 1 − O( 1/N ) after O( N ) iterations of Uw D.
Sadly, this is as good as it gets. A lower bound of Bennet et al. [BBBV97]
shows that any quantum √ algorithm using Uw as the representation for f must
apply Uw at least Ω( N ) times to find w. So we get a quadratic speedup
but not the exponential speedup we’d need to solve NP-complete problems
directly.