Sparse Fourier
Sparse Fourier
Ludwig Schmidt
MIT
[email protected]
Abstract
1
1 Introduction
The Discrete Fourier Transform (DFT) is one of the main mathematical workhorses of signal
processing. The most popular approach for computing the discrete Fourier transform is the Fast
Fourier Transform (FFT) algorithm. Invented in the 1960s, the FFT computes the frequency
representation of a signal of size N in O(N log N ) time. The FFT is widely used and was
considered to be one of the most influential and important algorithmic developments of the
20th century. However, with the emergence of big data problems, in which the size of the
processed data sets can easily exceed terabytes, the Fast Fourier Transform is no longer always
fast enough. Furthermore, in many applications it is hard to acquire a sufficient amount of
data to compute the desired Fourier transform. For example, in medical imaging, it is highly
desirable to reduce the time that the patient spends in an MRI machine. This motivates the
need for algorithms that can compute the Fourier transform in sub-linear time (in an amount
of time that is considerably smaller than the size of the data), and that use only a subset of the
input data. The Sparse Fourier Transform (SFT) provides precisely this functionality.
Developed over the last decade, sparse Fourier transform algorithms compute an approxi-
mation or compressed version of the DFT in time proportional to the sparsity of the spectrum
of the signal (i.e., the number of dominant frequencies), as opposed to the length of the sig-
nal. The algorithms use only a small subset of the input data and run in time proportional to
the sparsity or desired compression, considerably faster than in time proportional to the signal
length. This is made possible by requiring that the algorithms report only the non-zero or large
frequencies and their complex amplitudes, rather than a vector containing this information for
all frequencies. Since most video, audio, medical images, spectroscopic measurements (e.g.,
NMR), GPS signals, seismic data, and many more massive data sets are compressible or are
sparse, these results promise a significant practical impact in many big data domains.
The first algorithms of this type were designed for the Hadamard Transform; i.e., the Fourier
transform over the Boolean cube [KM91, Lev93] (cf. [GL89]). Shortly thereafter, algorithms for
the complex Fourier transform were discovered as well [Man92, GGI+ 02, AGS03, GMS05]. The
most efficient of those algorithms [GMS05], computed the DFT in time k logO(1) N , where k is
the sparsity of the signal spectrum. All of the algorithms are randomized and have a constant
probability of error. These developments were covered in [GST08].
Over the last few years, the topic has been the subject of extensive research, from the
algorithmic [Iwe10, Iwe13, Aka10, HIKP12b, HIKP12a, HAKI12, LWC13, BCG+ 12, GHI+ 13,
HKPV13, PR13, SHV13, IKP14], implementation [Sch13, WAPC+ 13] and hardware [YRR+ 12,
2
AHH+ 14] perspectives. These developments include the first deterministic algorithms that
make no errors [Iwe10, Iwe13, Aka10], as well as algorithms that, given a signal with k-sparse
spectrum, compute the non-zero coefficients in time O(k log N ) [HIKP12a] or even O(k log k)
[LWC13, PR13, GHI+ 13, SHV13]. The goal of this article is to survey these developments. We
focus on explaining the basic components and techniques used in the aforementioned algorithms,
coupled with illustrative examples and concrete applications. We do not cover the analysis of
sampling rates and running times of the algorithms (the reader is referred to the original papers
for proofs and analysis). However, we present an empirical analysis of the performance of
the algorithms. Finally, we discuss the connection between the SFT and other techniques for
massive data analysis such as streaming algorithms and compressive sensing.
2 Definitions
e−2πiωj/N
Fω,j := (1)
N
f̂ := F f . (2)
N −1
1 X
fˆω = fj e−2πiωj/N . (3)
N j=0
In this section it will also be useful to consider the inverse of the DFT matrix above. Its
entries are given by
−1
Fj,ω := e2πiωj/N (4)
f := F −1 f̂ = F −1 (F f ) . (5)
3
This allows one to write f ∈ CN in terms of its Fourier components by the formula
N
X −1
fj = fˆω e2πiωj/N . (6)
ω=0
3 Techniques
In this section we outline the basic components and techniques used in sparse FFT algorithms.
We start from the simple case when the spectrum of the signal consists of, or is dominated by,
only a single non-zero frequency. In this case we show that the position and the value of the
non-zero frequency can be found using only two samples of the signal (if the signal is a pure tone
without any noise) or a logarithmic number of samples (if the signal is contaminated by noise).
These techniques are described in Section 3.1. Second, we address the general case, by reducing
it to several sub-problems involving a single non-zero frequency. This is achieved by grouping
subsets of Fourier space together into a small number of bins. In the simplest case, each bin
corresponds to a frequency band, but other groupings are also possible. If each bin contains
only a single frequency (i.e., if a frequency is isolated), then we can solve the problem separately
for each bin using the techniques mentioned earlier. This leads to the sample complexity and
the running time proportional to the number of bins, which is lower bounded by the sparsity
parameter k. The binning techniques are described in Section 3.2. Finally, in Section 3.3,
we show how to deal with spectra in which two non-zero frequencies are very close to each
other, and thus cannot be easily isolated via binning. Specifically, we show how to permute
the spectrum of the signal in a pseudo-random fashion, by pseudo-randomly permuting the
time domain signal. Since the positions of the non-zero frequencies in the permuted signals are
(pseudo)-random, they are likely to be isolated, and then recovered by the binning procedure.
To ensure that all non-zero coefficients are recovered, the permutation and binning procedure
is repeated several times, using fresh randomness every time.
All sparse FFT algorithms follow this general approach, as discussed in more detail in
Section 4. However, they differ in the implementations of the specific modules, as well as in
the methods they use for aggregating the information gathered from different invocations of the
permutation and binning procedure.
4
3.1 Single Frequency Recovery
In the simplest possible case, a vector f ∈ CN contains a single pure frequency. In such a
setting an SFT must be able to rapidly determine the single tone much more quickly than
the O(N log N )-time FFT. In this section we will illustrate several different techniques for
accomplishing this fundamental task. In later sections, we will demonstrate techniques for
filtering more general vectors in order to produce the type of single-frequency vectors considered
here. For now, however, we will assume that our vector is of the following form:
Given a simple vector defined as in (7), one can quickly calculate ω by choosing j ∈ {0, . . . , N −
1}, and then computing the phase of
fj+1 2πω 2πω
= e2πiω/N = cos + i · sin .1 (8)
fj N N
Furthermore, once ω is known, fˆω can be calculated by computing fj · e−2πiωj/N . This proce-
dure2 effectively finds the DFT of the vector from (7) in O(1)-time by inspecting only 2 entries
of f .
Although fast, this straightforward technique is not generally very robust to noise. Specifi-
cally, if fj := fˆω · e2πiωj/N + j for all j, the phases calculated from (8) can fail to yield ω unless
|j | is much smaller than |fˆω |/N . Hence, it is often necessary to use different techniques to find
ω from (7).
One means of learning ω from (7) in a more noise tolerant fashion is to perform the equivalent
of a binary search for ω through the frequency domain. Many variants of such a search can be
performed. In this subsection we will illustrate the most basic type of search for the example
1
If j + 1 = N , we set fj+1 = f0 . More generally, we will assume that indices are always taken modulo N when
referring to entries of f ∈ CN below.
2
The procedure, referred to as “the OFDM trick” in [HIKP12a], was also used in [LWC13]. It can also be seen as
a very special case of the Prony Method [HKPV13].
5
Im Im
i= e2πi·2/8 i = e2πi·1/4
e2πi·3/8 e2πi·1/8
e2πi·5/8 e2πi·7/8
−i = e2πi·6/8 −i = e2πi·3/4
(a) The initial search problem. Our goal is to (b) The simplified search problem associated
locate the hidden frequency ω = 5, i.e., the gray with f 0 , which is the second stage of the binary
square at e2πiω/8 . In the first stage of the binary search. The unknown frequency ω = 5 has been
search, we determine that the hidden frequency mapped to ω 0 = 1 (i.e., the gray square) within
lies in the third quadrant. a smaller search space.
Figure 1: Recovering a single frequency via a binary search (see Section 3.1.2). Both subfigures
show the unit circle in the complex plane. The black dots indicate the potential locations of the
dominant, hidden frequency, which is represented by the gray square.
vector
fj := fˆω · e2πi·ωj/8 . (9)
Note that this is exactly (7) with N = 8. Here, ω is unknown. We begin knowing only that
ω ∈ {0, 1, 2, 3, 4, 5, 6, 7}.
6
and
fj · 1 − e2πi·ω/8 = fj − fj+1 < fj + fj+1 = fj · 1 + e2πi·ω/8 (11)
are true. Note that (10) and (11) will be true if and only if
and
e2πiω/8 − 1 < e2πiω/8 − (−1) (13)
are true, respectively. Hence, (10) and (11) are simply testing which axes of the complex plane
are best aligned with e2πiω/8 . If (10) holds true we may safely conclude that ω ∈
/ {5, 6, 7} (recall
Figure 1(a)). Otherwise, if (10) is false, we conclude that ω ∈
/ {1, 2, 3}. Similarly, (11) holding
true implies that ω ∈
/ {3, 4, 5}, while (11) failing to hold implies that ω ∈
/ {0, 1, 7}.
Returning to our example (9), suppose that both (10) and (11) fail to hold. The first test
(10) failing tells us that ω ∈
/ {1, 2, 3}, and the second failure tells us that ω ∈
/ {0, 1, 7}. Taken
all together, then, the tests tell us that ω ∈ {4, 5, 6} in this case (i.e., we learn that e2πi·ω/8 is
in the third quadrant of the complex plane).
Having learned that ω ∈ {4, 5, 6} allows us to simplify the problem. In particular, we may
now implicitly define a new vector
for all 0 ≤ j < 4. Note that f 0 ∈ C4 was formed by (i) shifting the possible range for
ω into the first quadrant (by multiplying f by e−2πi·4j/8 ), and then (ii) discarding the odd
entries. This effectively halves our initial problem: f 0 ∈ C4 is a vector with one frequency,
ω 0 = (ω − 4) ∈ {0, 1, 2} (see Figure 1(b)). Our new goal is to find ω 0 using 2 entries of f 0 (i.e.,
2 additional entries of f ).
Our second round of tests now proceedes exactly as before. We choose j ∈ {0, . . . , 3} and
then consider both
i · fj0 − fj+1
0
< i · fj0 + fj+1
0
, (15)
and
fj0 − fj+1
0
< fj0 + fj+1
0
. (16)
As above, these tests will collectively determine the quadrant of the complex plane containing
7
e2πi·(ω−4)/4 .
Continuing our example, suppose that (15) is true and (16) is false. This means that
(ω − 4) ∈ {1, 2} (i.e., we have ruled out 0 and 3). We can now implicitly form our last new
vector for the third round of tests. In particular, we form f 00 ∈ C2 by
for j = 0, 1. Note that we have once again formed our new vector by (i) shifting the possible
values of ω 0 = (ω − 4) to the first quadrant of the complex plain, and then (ii) discarding all
odd entries of f 0 . We now know that ω 00 = (ω − 5) ∈ {0, 1}, and may decide which it is by
testing f 00 .
In particular, suppose that
fj00 − fj+1
00
< fj00 + fj+1
00
(18)
ω − 5 = 0 ⇒ ω = 5.
This concludes the description of the binary search procedure for identifying the non-zero
frequency. Since we learn the position of the frequency bit by bit, the total number of samples
used is O(log N ). Furthermore, we note that the binary search is (relatively) robust to noise.
Adding small perturbations to each entry of (9) will not stop us from determining that ω = 5.
Further details are given, e.g., in [GST08].
N = 70 = 2 · 5 · 7,
8
and let a ∈ C2 be the 2-element sub-vector of f from (7) with
1 + (−1)ω
a0 = fˆω ·
b , (20)
2
and
1 + (−1)ω+1
a1 = fˆω ·
b . (21)
2
To finish our example, suppose that we have used four FFT’s on sub-vectors of f of size 2,
5, and 7 to determine that ω ≡ 1 mod 2, ω ≡ 4 mod 5, and ω ≡ 3 mod 7, respectively. Using
that ω ≡ 1 mod 2 we can see that ω = 2 · a + 1 for some integer a. Using this new expression
for ω in our second modulus we get
(2 · a + 1) ≡ 4 mod 5 ⇒ a ≡ 4 mod 5.
9
difficult to handle since frequency moduli may begin to collide modulo various numbers. In the
next section we will discuss methods for removing this difficulty.
2 1
0
1 2 3 4 5 6 7 8 9 10 11
−1
1 2 3 4 5 6 7 8 9 10 11
−2
(a) Three equally spaced subsamples (the red dia- (b) Three frequency bins, one for each residue modulo
monds) three
2 1
0
1 2 3 4 5 6 7 8 9 10 11
−1
1 2 3 4 5 6 7 8 9 10 11
−2
(c) Four equally spaced subsamples (the red diamonds) (d) Four frequency bins, one for each residue modulo
four
Figure 2: A demonstration of aliasing-based filtering. The vector under consideration has entries
fj := cos(2π · 3 · j/12) for j = 0, . . . , 11. Note that f̂ ∈ C12 contains only two nonzero entries:
fˆ3 = 1, and fˆ9 = 1. Figure 2(a) marks the entries of a sub-vector, a ∈ R3 , of f . Its DFT, â,
also has three entries, each corresponding to a different subset of f̂ . Each subset is labeled using
a different symbol in figure 2(b). Note that both nonzero entries of f̂ fall into the same subset –
both are labeled with a green diamond. Figure 2(c) marks the entries of another sub-vector of f
consisting of four entries. Its DFT partitions the entries of f̂ into four subsets (see figure 2(d)). In
this case the two nonzero entries of f̂ fall into different subsets: one is labeled with a pentagon,
and the other with a triangle.
We will begin our discussion of filtering by extending our aliasing-based frequency identification
ideas from the last section. In this example we assume that our vector f has length twelve,
with entries given by fj := cos(2π · 3 · j/12) for j = 0, . . . , 11. Note that this means f̂ ∈ C12
has two nonzero entries: fˆ3 = 1, and fˆ9 = 1. Our objective is to learn the location and Fourier
coefficient of each of them by reading fewer than 12 entries of f .
Preceding according to the last section, we might try to learn the two frequencies by looking
at the three element sub-vector of f , a ∈ R3 , given by aj := f4j for j = 0, 1, 2 (see Figure 2(a)).
Unfortunately, we will fail to learn anything about the individual entries of f̂ this way because
10
both of its nonzero DFT entries are congruent to 0 modulo 3 (see Figure 2(b)).3 In particular,
we will only see that â0 = fˆ0 + fˆ3 + fˆ6 + fˆ9 = 2, and that â1 = â2 = 0. If all we know is that f̂
contains at most two nonzero entries, we are unable to determine its nonzero entries using this
information. The problem is that the two nonzero entries of f̂ have collided modulo 3 (i.e., they
are both congruent to the same residue modulo 3).
Note, however, that the CRT guarantees that the two nonzero entries of f̂ can not also
collide modulo 4 = 12/3. If a new sub-array of f is created using four equally spaced entries
(see Figure 2(c)), its DFT will separate the two nonzero entries of f̂ (see Figure 2(d)). The
end result is that two sub-vectors will always reveal the locations of the nonzero entries of f̂ , as
long as f̂ has at most two nonzero entries. More generally, one can use similar ideas in order to
learn the k largest magnitude entries of f̂ from the results of a small number of aliased DFTs
of sub-vectors of f .
-20 -10 10 20
(a) The filter function (b) Fourier Space: Three modulations of the fil-
ter
Figure 3: A filter function with small (effective) support, and several frequency bins resulting from
different modulations of the filter. The filter is a product of a sinc function with a Gaussian (see
figure 3(a)). The Fourier transform of the filter is a characteristic (i.e., box) function convolved with
a Gaussian. Three translates of the Fourier transform – each produced by a different modulation
of the filter function – are graphed in figure 3(b). Note that modulations of the filter can be used
to (effectively) regroup the Fourier spectrum into a small number of (essentially) disjoint frequency
bins.
In the continuous setting, one can view the preceding discussion as a demonstration of
how a relatively small set of spike-train filters can be used in order to separate important
frequencies from one another in Fourier space. This turns out to be a fruitful interpretation.
This perspective motivates the development of other types of filters which, when modulated
(i.e., shifted in Fourier space) a few times, can be used to group different subsets of Fourier
space together into a small number of shorter intervals, or “bins” (see Figure 3). If the most
3
Recall that âω is the sum of all Fourier coefficients whose indices are congruent to ω modulo 3.
11
important frequencies in a function, f , are uniformly spread over a given interval of Fourier
space, one will be likely to isolate them from one another by convolving f with a few different
modulations of such a filter. This effectively “bins” the Fourier coefficients of fˆ into different
frequency bins. Once an important frequency is isolated in a filtered version of f , the methods
from the last subsection can then be used to recover it via, e.g., a modified binary search.
Note that a good continuous filter can be periodized and discretized for use as part of a
discrete SFT. One generally does so in order to design a discrete filter that is highly sparse in
time (i.e., that is “essentially zero” everywhere, except for a small number of time-domain en-
tries). This allows fast convolution calculations to be performed with the filter during frequency
binning. This is crucial since these convolutions are used to repeatedly compute time samples
from filtered versions of f̂ during each modified binary search for an important frequency. Fur-
thermore, the DFT of the discrete filter should also have a special structure in order to aid in
the construction of good, low leakage4 , frequency filters. Suppose, for example, that one wants
to isolate the k-largest entries of f̂ from one another. To accomplish this, one should use a
filter whose DFT looks like a characteristic function on {0, . . . , O(N/k)} ⊂ [0, N ) ∩ Z (i.e., on
a O(1/k)-fraction of the “discrete Fourier spectrum” of f ). The filter can then be modulated
O(k) times in order to create a filter bank with O(k) approximate “pass regions”, each of size
O(N/k), that collectively tile all of [0, N ) ∩ Z. These pass regions form the frequency bins
discussed above (recall Figure 3(b)). To date several different types of filters have been utilized
in sparse FFTs, including Gaussians [HIKP12b], indicator functions [GGI+ 02, GMS05], spike
trains [Iwe10, Iwe13, GHI+ 13, PR13], and Dolph-Chebyshev filters [HIKP12a].
As mentioned in the previous subsection, a filter function can be used to isolate the most
important entries of f̂ from one another when they are sufficiently well separated. Unfortunately,
an arbitrary vector f ∈ CN will not generally have a DFT with this property. The largest
magnitude entries of f̂ can appear anywhere in principle. One can compensate for this problem,
however, by pseudo-randomly permuting f̂ so that it “looks” uniformly distributed. As long as
the permutation is reversible, any information gathered from the permuted vector can then be
directly translated into information about the original vector’s DFT, f̂ .
Perhaps the easiest means of randomly permuting f ∈ CN , and therefore f̂ , is to use two
basic properties of Fourier transform: the scaling property, stating that for aj = fcj we have
4
We say that a frequency filter leaks if it has non-zero values at frequencies other than our desired values.
12
âj = fˆc−1 j (where c−1 is the inverse of c modulo N , assuming it exists); and the modulation
property, stating that for aj = e2πi·b·j/N · fj we have âj = fˆj−b . We proceed by choosing two
random integers b, c ∈ [0, N ), and defining a ∈ CN as
for j = 0, . . . , N − 1. It can be seen that â is a permuted version of f̂ , as the entry fˆω appears
in â as entry (ω · c + b) mod N . Note that permutations of this form are not fully random,
even though b and c were selected randomly. Nevertheless, they are “random enough” for our
purposes. In particular, the probability that any two non-zero coefficients land close to each
other can be shown to be small.
In the simplest setting, a sparse FFT is a method which is designed to approximately com-
pute the discrete Fourier transform (DFT) of a vector f ∈ CN as quickly as possible under
the presumption that the result, f̂ ∈ CN , will be sparse, or compressible. Here compressible
means that f̂ will have a small number of indices (i.e., frequencies) whose entries (i.e., Fourier
coefficients) have magnitudes that are large compared to the Euclidean norm of f̂ (i.e., the
energy of f̂ ). Sparse FFTs improve on the runtime of traditional FFTs for such Fourier sparse
signals by focussing exclusively on identifying energetic frequencies, and then estimating their
Fourier coefficients. This allows sparse FFTs to avoid “wasting time” computing the Fourier
coefficients of many insignificant frequencies.
Although several different sparse FFT variants exist, they generally share a common three
stage approach to computing the sparse DFT of a vector: Briefly put, all sparse FFTs (repeat-
edly) perform some version of the three following steps:
2. accurate estimation of the Fourier coefficients of the frequencies identified in the first step,
and
3. subtraction of the contribution of the partial Fourier representation computed by the first
two steps from the entries of f before any subsequent repetitions.
Generally, each repetition of the three stages above is guaranteed to gather a substantial fraction
13
of the energy present in f̂ with high probability. Subtracting the located coefficients from the
signal effectively improves the spectral sparsity of the given input vector, f , from one repetition
to the next. The end result is that a small number of repetitions will gather (almost) all of
the signal energy with high probability, thereby accurately approximating the sparse Fourier
transform, f̂ , of the given vector f .
Consider, for example, a vector f whose DFT has 100 nonzero entries (i.e., 100 nonzero
Fourier coefficients). The first round of the three stages above will generally find and accurately
estimate a large fraction of these entries (e.g., three fifths of them, or 60 terms in this case). The
contributions of the discovered terms are then subtracted off of the remaining samples. This
effectively reduces the number of nonzero entries in f̂ , leaving about 40 terms in the current
example. The next repetition of the three stages is now executed as before, but with the smaller
effective sparsity of 40. Eventually all nonzero entries of f̂ will be found and estimated after a
few repetitions with high probability. We will now consider each of the three stages mentioned
above in greater detail.
Stage (i) of each repetition, which identifies frequencies whose Fourier coefficients are large in
magnitude, is generally the most involved of the three repeated stages mentioned above. It
usually consists of several ingredients, including: randomly sampling f in order to randomly
permute its DFT, filtering to separate the permuted Fourier coefficients into different frequency
bands, and estimating the energy in subsets of each of the aforementioned frequency bands.
Many of these ingredients are illustrated with concrete examples in Section 3 above. Our
objective now is to understand the general functionality of the identification stage as a whole.
Roughly speaking, stage (i) works by randomly binning the Fourier coefficients of f into
a small number of “bins” (i.e., frequency bands) , and then performing a single frequency
recovery procedure (as described in Section 3.1) within each bin in order to find any energetic
frequencies that may have been isolated there. The randomness is introduced into the Fourier
spectrum of f ∈ CN by randomly subsampling its entries (see Section 3.3). This has the
effect of randomly permuting the entries of f̂ . The resulting “randomized version of f̂ ” is then
binned via a filter bank. (i.e., as discussed in Section 3.2). Because f̂ is approximately sparse,
each “frequency bin” is likely to receive exactly one relatively large Fourier coefficient. Each
such isolated Fourier coefficient is then identified by using one of the procedures described in
Section 3.1. The collection of frequencies discovered in each different bin is then saved to be
14
analyzed further during the estimation stage (ii).
Recall that stage (ii) involves estimating the Fourier coefficient, fˆω , of each frequency ω identi-
fied during stage (i) of the sparse FFT. In the simplest case, this can be done for each such ω
by using L N independent and uniformly distributed random samples from the entries of f ,
fu1 , . . . , fuL , in order to compute the estimator
L
1 X
fˆω0 := · ful e−2πi·ω·ul /N . (23)
L
l=1
h i
Note that fˆω0 is an unbiased estimator for fˆω (i.e., E fˆω0 = fˆω ) whose variance is O(kf̂ k22 /L).
Thus, the estimator will approximate fˆω to high (relative) precision with high probability when-
ever |fˆω |2 is large compared to kf̂ k22 /L. In slightly more complicated scenarios, the estimates
might come “for free” as part of the identification stage (see Section 3.2 for an example).
A naive implementation of stage (iii) is even more straightforward than stage (ii). Suppose
that
n o
fˆω0 m m = 1, . . . , k ⊂ C
is the approximate sparse DFT discovered for f during stages (i) and (ii) of the current repe-
tition of our sparse FFT. Here, ω1 , . . . , ωk are the frequencies identified during stage (i), while
fˆω0 1 , . . . , fˆω0 m are the estimates for their Fourier coefficients found during stage (ii) (e.g., via
(23)). In future iterations of stages (i) through (iii), one can simply replace each sampled entry
of f , fj , with
k
X
fj − fˆω0 m e2πi·ωm ·j/N . (24)
m=1
If the entries of f to be used during each iteration of the three stages have been predetermined,
which is often the case, (24) can be used to update them all at once. These “updated samples”
are then used in the subsequent repetitions of the three stages.
One of the primary purposes of stage (iii) is to avoid mistakenly identifying insignificant
frequencies as being energetic. Suppose, for example, that a small number of erroneous Fourier
coefficients are identified during the j th repetition of the first two stages. Then, subtracting
their contribution from the original signal samples during the third stage will effectively add
15
them as new, albeit erroneous, energetic Fourier coefficients in f̂ . This, in turn, allows them to
be corrected in subsequent repetitions of the first two stages. Hence, stage three allows errors
(assuming they are rare) to be corrected in later repetitions.
In contrast, some SFT methods [Iwe10, Iwe13, HIKP12b] perform only stages (i) and (ii)
without any stage (iii). These methods identify all the energetic frequencies in stage (i) and
then estimate their Fourier coefficients in stage (ii), completely in only one iteration. Of course,
such methods can also mistakenly identify significant frequencies as being energetic during stage
(i). Such mistaken frequencies are, however, generally discovered as being insignificant by these
methods later, during stage (ii), when their Fourier coefficients are estimated.
5 Empirical Evaluation
In this section we compare several existing SFT implementations to FFTW, a fast implemen-
tation of the standard FFT, in order to demonstrate the computational gains that recent SFTs
can provide over the standard FFT when dealing with Fourier-compressible signals. To this
end, we consider the following algorithms and implementations.
All implementations are freely available.5 The SFT variants considered herein are limited to
those which compute the DFT of a vector (i.e., (2)). Additional experiments involving other
existing SFT variants which sample continuous functions, as well as additional experiments
demonstrating noise tolerance and sampling complexity, can be found here.6
Figure 4 plots the runtimes of these SFTs against FFTW for various sparsity levels, k, and
vector lengths, N . The algorithms were run on randomly generated vectors of length N whose
DFTs were k-sparse (containing k ones in randomly chosen locations) for varying values of k
and N . For each pair of values of k and N , the parameters of the (randomized) algorithms were
5
FFTW is available at http://www.fftw.org. AAFFT, as well as several significantly faster sampling-
based SFTs, are available at http://sourceforge.net/projects/aafftannarborfa/. The ETH implementations
are available at http://www.spiral.net/software/sfft.html. All other SFT implementations are available at
http://groups.csail.mit.edu/netmit/sFFT/.
6
https://github.com/ludwigschmidt/sft-experiments
16
AAFFT
100 AAFFT
101 FFTW
FFTW
SFFT1-ETH
SFFT1-ETH
SFFT1-MIT
SFFT1-MIT
10−1 100 SFFT2-ETH
SFFT2-ETH
SFFT2-MIT
SFFT2-MIT
SFFT3-ETH
Time (seconds)
SFFT3-ETH
Time (seconds)
10−2
10−1
10−2
10−3
10−3
10−4
10−4
2 13
2 14
215 16
2 17
2 18
2 219 20
2 21
2 22
2 23
2 24
2 2 25 50 100 200 500 1,000 2,000 4,000
n k
(a) Runtime as a function of the signal length N , (b) Runtime as a function of the signal length k,
for k = 50 for N = 222
Figure 4: Running time plots for several algorithms and implementations of sparse FFT
optimized in order to minimize the running time while ensuring that the empirical probability
of correct recovery was greater than 0.9.
6 Applications
In this section we give an overview of some of the data-intensive applications of sparse FFT
algorithms and techniques that emerged over the last few years. These applications involve, for
example, GPS receivers, cognitive radios, and, more generally, any analog signal that we wish
to digitize. It is these applications that we focus on as they highlight the role of sparse FFT
algorithms in the signal processing of large data.
GPS synchronization. In the (simplified)7 GPS synchronization problem, we are given a
(pseudorandom) code, corresponding to a particular satellite. The satellite repeatedly transmits
the code. Furthermore, we are given a signal recorded by a GPS receiver, which consists of a
window of the signal generated by the satellite, corrupted by noise. The goal is to align the
code to the recorded signal, i.e., identify where the code starts and ends. To this end, the
receiver computes the convolution of the code and the received signal and reports the shift that
maximizes the correlation. This computation is typically done using the FFT: one applies the
FFT to the code and the signal, computes the product of the outputs, and applies the inverse
FFT to the product.
The paper [HAKI12] uses sparse FFT techniques to speed-up the process. The improvement
is based on the following observation: since the output of the inverse FFT contains a single peak
7
For simplicity, this description ignores certain issues such as the Doppler shift, etc. See [HAKI12] for details.
17
corresponding to the correct shift, the inverse step can be implemented using the sparse FFT.
In fact, since k = 1, the algorithm is particularly simple, and relies on a simple aliasing filter.
Furthermore, since the sparse inverse FFT algorithm uses only some of the samples of the
product, it suffices to compute only those samples. This reduces the cost of the forward step as
well. The experiments on real signals show that the new algorithm reduces the median number
of multiplications by a factor of 2.2, or more if the value of the Doppler shift is known.
Spectrum sensing. The goal of a spectrum sensing algorithm is to scan the available spectrum
and identify the “occupied” frequency slots. In many applications this task needs to be done
quickly, since the spectrum changes dynamically. Unfortunately, scanning a GHz-wide spectrum
is a highly power-consuming operation. To reduce the power and acquisition time, one can use
a sparse Fourier transform to compute the frequency representation of a sparse signal without
sampling it at full bandwidth. One such proposal was presented in [YG12] which uses a method
of frequency identification similar to that described in Section 3.1.3. Another approach is
presented in [HSA+ 14], which uses a sparse FFT procedure similar to that in [GHI+ 13, PR13].
It describes a prototype device using 3 software radios (USRPs), each sampling the spectrum
at 50 MHz. The device captures 0.9 GHz, i.e., 6x larger bandwidth than the three USRPs
combined.
Analog to digital converters (ADCs). The random binning procedure described in Sec-
tion 3.3 forms the basis of the pulse-position-modulation (PPM) ADC presented in [YRR+ 12].
A prototype 9-bit random PPM ADC incorporating a pseudo-random sampling scheme is im-
plemented as proof of concept. The approach leverages the energy efficiency of time-based
processing.
Other applications include 2D Correlation Spectroscopy [SAH+ 13].
7 Conclusion
It is interesting to note that SFTs have a good deal in common with compressive sensing
techniques. The latter generally aim to reduce sampling requirements as much as possible in
order to recover accurate sparse approximations of frequency-compressable functions. SFTs, on
the other hand, attempt to recover accurate sparse approximations of frequency-compressible
functions as quickly as absolutely possible. By necessity, therefore, an SFT also cannot sample a
function many times (i.e., since sampling takes time). As a result, SFTs also utilize a relatively
small number of samples and, so, can be considered as compressive sensing algorithms. This
puts SFTs into a broader spectrum of compressive sensing strategies that tradeoff additional
18
sampling for decreased computational complexity.
SFTs are also closely related to streaming (or sublinear) algorithms developed in the com-
puter science community. Streaming algorithms aim to run in time considerably smaller than
the time required to read the entire original data set, or signal. Hence, the streaming literature
contains a rich set of tools for processing and approximating large data sets both quickly and
accurately. Many of the techniques employed in existing SFTs are adapted to the Fourier setting
from streaming techniques. For an overview of the links between these two topics, see [Ind13].
Acknowledgements
The authors would like to thank Haitham Hassanieh, Chinmay Hegde, David Lawlor, Eric Price
and Jörn Schumacher for providing source code and very helpful discussions.
References
[AGS03] A. Akavia, S. Goldwasser, and S. Safra. Proving hard-core predicates using list
decoding. FOCS, 44:146–159, 2003.
[BCG+ 12] P. Boufounos, V. Cevher, A. C. Gilbert, Y. Li, and M. J. Strauss. What’s the fre-
quency, Kenneth?: Sublinear Fourier sampling off the grid. RANDOM/APPROX,
2012.
[GHI+ 13] B. Ghazi, H. Hassanieh, P. Indyk, D. Katabi, E. Price, and L. Shi. Sample-optimal
average-case sparse Fourier transform in two dimensions. Allerton, 2013.
[GL89] O. Goldreich and L. Levin. A hard-core predicate for all one-way functions. STOC,
pages 25–32, 1989.
[GMS05] A. Gilbert, M. Muthukrishnan, and M. Strauss. Improved time bounds for near-
optimal space Fourier representations. SPIE Conference, Wavelets, 2005.
19
[GST08] A.C. Gilbert, M.J. Strauss, and J. A. Tropp. A tutorial on fast Fourier sampling.
Signal Processing Magazine, 2008.
[HAKI12] H. Hassanieh, F. Adib, D. Katabi, and P. Indyk. Faster GPS via the sparse Fourier
transform. MOBICOM, 2012.
[HIKP12b] H. Hassanieh, P. Indyk, D. Katabi, and E. Price. Simple and practical algorithm
for sparse Fourier transform. SODA, 2012.
[HKPV13] S. Heider, S. Kunis, D. Potts, and M. Veit. A sparse Prony FFT. SAMPTA, 2013.
[HSA+ 14] H. Hassanieh, L. Shi, O. Abari, E. Hamed, and D. Katabi. Ghz-wide sensing and
decoding using the sparse Fourier transform. INFOCOM, 2014.
[Ind13] P. Indyk. Sketching via hashing: from heavy hitters to compressed sensing to
sparse Fourier transform. PODS, pages 87–90, 2013.
[Iwe13] M.A. Iwen. Improved approximation guarantees for sublinear-time Fourier algo-
rithms. Applied And Computational Harmonic Analysis, 34:57–82, 2013.
[KM91] E. Kushilevitz and Y. Mansour. Learning decision trees using the Fourier spec-
trum. STOC, 1991.
[LWC13] D. Lawlor, Y. Wang, and A. Christlieb. Adaptive sub-linear time Fourier algo-
rithms. Advances in Adaptive Data Analysis, 5(1), 2013.
20
[Sch13] J. Schumacher. High Performance sparse fast fourier transform. Master thesis,
Computer Science, ETH Zurich, Switzerland, 2013.
21