FAST Algorithm For Submodular Maximization
FAST Algorithm For Submodular Maximization
Abstract 1. Introduction
In this paper we describe a new parallel al- In this paper we describe a fast parallel algorithm for sub-
gorithm called Fast Adaptive Sequencing Tech- modular maximization.1 Informally, a function is submod-
nique (FAST) for maximizing a monotone sub- ular if it exhibits a natural diminishing returns property. For
modular function under a cardinality constraint the canonical problem of maximizing a monotone submod-
k. This algorithm achieves the optimal 1 1/e ular function under a cardinality constraint, it is well known
approximation guarantee and is orders of mag- that the greedy algorithm, which iteratively adds elements
nitude faster than the state-of-the-art on a variety whose marginal contribution is largest to the solution,
of experiments over real-world data sets. Follow- obtains a 1 1/e approximation guarantee (Nemhauser
ing recent work by Balkanski & Singer (2018a), et al., 1978) which is optimal for polynomial-time algo-
there has been a great deal of research on algo- rithms (Nemhauser & Wolsey, 1978). The greedy algo-
rithms whose theoretical parallel runtime is ex- rithm and other submodular maximization techniques are
ponentially faster than algorithms used for sub- heavily used in machine learning and data mining as many
modular maximization over the past 40 years. fundamental objectives such as entropy, mutual informa-
However, while these new algorithms are fast in tion, graphs cuts, diversity, and set cover are submodular.
terms of asymptotic worst-case guarantees, it is In recent years there has been a great deal of progress on
computationally infeasible to use them in prac- fast algorithms for submodular maximization designed to
tice even on small data sets because the num- accelerate computation on large data sets. The first line
ber of rounds and queries they require depend on of work considers serial algorithms where queries can be
large constants and high-degree polynomials in evaluated on a single processor (Leskovec et al., 2007;
terms of precision and confidence. The design Badanidiyuru & Vondrák, 2014; Mirzasoleiman et al.,
principles behind the FAST algorithm we present 2015; 2016; Ene & Nguyen, 2019b;c). For serial algo-
here are a significant departure from those of re- rithms the state-of-the-art for maximization under a car-
cent theoretically fast algorithms. Rather than dinality constraint is the lazier-than-lazy-greedy (LTLG)
optimize for asymptotic theoretical guarantees, algorithm which returns a solution that is in expectation ar-
the design of FAST introduces several new tech- bitrarily close to the optimal 1 1/e and does so in a linear
niques that achieve remarkable practical and the- number of queries (Mirzasoleiman et al., 2015). This al-
oretical parallel runtimes. The approximation gorithm is a stochastic greedy algorithm coupled with lazy
guarantee obtained by FAST is arbitrarily close updates, which not only performs well in terms of the qual-
to 1 1/e, and its asymptotic parallel run- ity of the solution it returns, but is also very fast in practice.
time (adaptivity) is O(log(n) log2 (log k)) using
O(n log log(k)) total queries. We show that Accelerating computation beyond linear runtime requires
FAST is orders of magnitude faster than any algo- parallelization. The parallel runtime of blackbox optimiza-
rithm for submodular maximization we are aware tion is measured by adaptivity, which is the number of se-
of, including hyper-optimized parallel versions quential rounds an algorithm requires when polynomially-
of state-of-the-art serial algorithms, by running many queries can be executed in parallel in every round.
experiments on large data sets. For maximizing a submodular function defined over a
ground set of n elements under a cardinality constraint k,
1
Harvard University, Cambridge, MA. Correspon- the adaptivity of the naive greedy algorithm is O(k), which
dence to: Adam Breuer <[email protected]>, in the worst case is O(n). Until recently no algorithm was
Eric Balkanski <[email protected]>, Yaron Singer
<[email protected]>. known to have better parallel runtime than that of greedy.
A very recent line of work initiated by Balkanski &
Proceedings of the 37 th International Conference on Machine
Learning, Vienna, Austria, PMLR 119, 2020. Copyright 2020 by 1
Code is available from www.adambreuer.com/code.
the author(s).
The FAST Algorithm for Submodular Maximization
Singer (2018a) develops techniques for designing constant 1.2. Our contribution
approximation algorithms for submodular maximization
In this paper we design a new algorithm called Fast Adap-
whose parallel runtime is logarithmic (Balkanski & Singer,
tive Sequencing Technique (FAST) for maximizing a mono-
2018b; Balkanski et al., 2018; Ene & Nguyen, 2019a;
tone submodular function under a cardinality constraint k.
Fahrbach et al., 2019a;b; Kazemi et al., 2019; Chekuri &
FAST has an approximation ratio that is arbitrarily close to
Quanrud, 2019a;b; Balkanski et al., 2019a;b; Ene et al.,
1 1/e, is O(log(n) log2 (log k)) adaptive, and uses a total
2019; Chen et al., 2019; Esfandiari et al., 2019; Qian &
of O(n log log(k)) queries. The main contribution is not in
Singer, 2019). In particular, Balkanski & Singer (2018a)
the algorithm’s asymptotic guarantees, but in its design that
describe a technique called adaptive sampling that obtains
is extremely efficient both in terms of its non-asymptotic
in O(log n) rounds a 1/3 approximation for maximizing
worst case query complexity and number of rounds, and
a monotone submodular function under a cardinality con-
in terms of its practical runtime. In terms of actual query
straint. This technique can be used to obtain an approxi-
complexity and practical runtime, this algorithm outper-
mation arbitrarily close to the optimal 1 1/e in O(log n)
forms all algorithms for submodular maximization we are
rounds (Balkanski et al., 2019a; Ene & Nguyen, 2019a).
aware of, including hyper-optimized versions of LTLG. To
be more concrete, we give a brief experimental compari-
1.1. From theory to practice
son in the table above for a movie recommendation objec-
The focus of the work on adaptivity described above has tive on n = 500 movies against optimized implementations
largely been on conceptual and theoretical contributions: of algorithms with the same adaptivity and approximation
achieving strong approximation guarantees under various (experiment details in Section 4).2
constraints with runtimes that are exponentially faster un- FAST achieves its speedup by thoughtful design that results
der worst case theoretical analysis. From a practitioner’s in frugal worst case query complexity as well as several
perspective however, even the state-of-the-art algorithms in heuristics used for practical speedups. From a purely ana-
this genre are infeasible for large data sets. The logarithmic lytical perspective, FAST improves the " dependency in the
parallel runtime of these algorithms carries extremely large linear term of the query complexity of at least Õ(" 5 n)
constants and polynomial dependencies on precision and in Balkanski et al. (2019a) and Ene & Nguyen (2019a)
confidence parameters that are hidden in their asymptotic and Õ(" 3 n) in Fahrbach et al. (2019a) to Õ(" 2 n). We
analysis. In terms of sample complexity alone, obtaining provide the first non-asymptotic bounds on the query and
(for example) a 1 1/e 0.1 approximation with 95% con- adaptive complexity of an algorithm with sublinear adap-
fidence for maximizing a submodular function under cardi- tivity, showing dependency on small constants. Our algo-
nality constraint k requires evaluating at least 108 (Balka- rithm uses adaptive sequencing (Balkanski et al., 2019b)
nski et al., 2019a) or 106 (Fahrbach et al., 2019a; Chekuri and multiple optimizations to improve the query complex-
& Quanrud, 2019b) samples of sets of size approximately ity and runtime.
log n in every round. Even if one heuristically uses a single
k
• Dependence on large polynomials in ". Adaptive • Search for position i? . To find the position i? , which
sampling algorithms crucially rely on sampling, and is the largest position i 2 [k] in the sequence such that
as a result their query complexity has high polynomial a large fraction of not-yet-discarded elements have
dependency on " (e.g. at least O(" 5 n) in Balkanski high marginal contribution to S [ Ai 1 , the vanilla
et al. (2019a) and Ene & Nguyen (2019a)). Due to adaptive sequencing technique queries the marginal
these " dependencies, the query complexity blows up contribution of all elements in X at each of the k po-
with any reasonable value for ". In contrast, adaptive sitions. This search for i? causes the O(nk) query
sequencing generates a single random sequence at ev- complexity.
ery iteration. Therefore, in the term that is linear in n Instead, similarly to the search of OPT, we binary
we can obtain an " dependence that is only Õ(" 2 ). search over a set of " 1 log k geometrically increas-
• Dependence on large constants. The asymptotic ing values i that correspond to guesses of i? . This
query complexity of previous algorithms depends on improves the O(nk) dependency on n and k in the
very large constants (e.g. at least 60000 in Balkan- query complexity to O(n log(log k)). Then, at any
ski et al. (2019a) and Ene & Nguyen (2019a)) mak- step of the binary search over a position i, instead of
ing them impractical. As we tried to optimize con- evaluating the marginal contribution of all elements
stants for adaptive sampling, we found that due to the in X to S [ Ai 1 , we only evaluate a small sample
sampling and the requirement to maintain strong the- of elements. In the practical speedups below, we dis-
oretical guarantees, the constants cascade and grow cuss how we can often skip this binary search for i? in
through multiple parts of the analysis. In principle, practice.
adaptive sequencing does not rely on sampling, which 3
Fahrbach et al. (2019a) do some preprocessing to estimate
dramatically reduces its dependency on constants. OPT, but it is estimated within some very large constant.
The FAST Algorithm for Submodular Maximization
objective value
objective value
objective value
objective value
1e+02 ● ●
●
● ●
● 2e+02 1.0e+02
●
● ● 2e+02
● FAST ●
● Am−Filtering ●
●
5e+01
● Rand−P−Greedy ● 5.0e+01
1e+02
Bin−Search−Max ● 1e+02
Parallel−Greedy
Parallel−LTLG
0e+00 0.0e+00
Random 0e+00 0e+00
10 20 30 40 10 20 30 40 10 20 30 40 10 20 30 40
k k k k
20.0 20.0
15.0 15.0
10.0 10.0
5.0 5.0
1.0 1.0
0.5 0.5
time (sec)
time (sec)
time (sec)
time (sec)
1.0 1.0
0.5 0.5
0.1 0.1
0.1 0.1
●
● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ●
10 20 30 40 10 20 30 40 10 20 30 40 10 20 30 40
k k k k
Figure 1. Experiment Set 1.a: FAST (blue) vs. low-adaptivity algorithms and PARALLEL -LTLG on graphs (time axis log-scaled).
3.1. Analysis the query complexity (Lemma 7), we note that there are
|X| + m` function evaluations per iteration.
We show that FAST obtains a 1 1/e " approximation
w.p. 1 and that it has Õ(" 2 log n) adaptive complexity
and Õ(" n + " 4 log(n) log( 1 )) query complexity.
2 4. Experiments
2 log(2 1 `) Our goal in this section is to show that in practice, FAST
Theorem 1. Assume k and " 2 (0, 0.1),
"2 (1 5") finds solutions whose value meets or exceeds alternatives
where ` = log( log" k ). Then, FAST with sample complexity in less parallel runtime than both state-of-the-art low-
4` log n
"2 ) has at most " log(n)`2 adap-
2+" 2
m = "2 (1 3") log( adaptivity algorithms and L AZIER - THAN -L AZY-G REEDY
tive rounds, 2" `n+" log(n)` m queries, and achieves
2 4 2 (LTLG). To accomplish this, we build optimized parallel
a 1 1e 4" approximation with probability 1 . MPI implementations of FAST, other low-adaptivity algo-
rithms, and LTLG, which is widely regarded as the fastest
We defer the analysis to Appendix B. The main part of it algorithm for submodular maximization in practice. We
is for the approximation guarantee, which consists of two then use 95 Intel Skylake-SP 3.1 GHz processors on AWS
cases depending on the condition which breaks the outer- to compare the algorithms’ runtime over a variety of objec-
loop. Lemma 3 shows that when there are " 1 iterations tives defined on 8 real and synthetic datasets. We measure
of the outer-loop, the set of elements added to S at every runtime using a rigorous measure of parallel time (see Ap-
iteration of the outer-loop contributes " 1 (OPT f (S)). pendix C.8). Appendices C.1, C.4, C.9, and C.6 contain
Lemma 5 shows that for the case where |S| = k, the ex- detailed descriptions of the benchmarks, objectives, imple-
pected contribution of each element ai added to S is arbi- mentations, hardware, and experimental setup on AWS.5
trarily close to (OPT f (S))/k. For each solution Sv , we We conduct two sets of experiments. The first set com-
need the approximation guarantee to hold with high prob- pares FAST to previous low-adaptivity algorithms on 8 ob-
ability instead of in expectation to be able to binary search jectives. Since previous algorithms all have practically in-
over guesses for OPT, which we obtain in Lemma 7 by tractable sample complexity, we grossly reduce their sam-
generalizing the robust guarantees of Hassidim & Singer ple complexity to only 95 samples per iteration so that
(2017) in Lemma 6. The main observation to obtain the each processor performs a single function evaluation per
adaptive complexity (Lemma 6) is that, by definition of iteration. This reduction, which we discuss in detail be-
i? , at least an " fraction of the surviving elements in X low, gives these algorithms a large runtime advantage over
are discarded at every iteration with high probability.4 For FAST, which computes its full theoretical sample complex-
4
To obtain the adaptivity r with probability 1 and the approx- ity in all experiments. This is practically feasible for FAST
imation guarantee w.p. 1 , the algorithm declares failure after 5
Code is available from www.adambreuer.com/code.
r rounds and accounts for this failure probability in .
The FAST Algorithm for Submodular Maximization
Traffic Network: n=525 Movie Recommendation: n=500 YouTube: n=571 Influence Max: n=769
1.2e+09
4e+02
● ● ●
● 3e+05 ●
● ●
● ● ●
● ●
9.0e+08 2e+03 3e+02
objective value
objective value
objective value
objective value
● ●
● ●
●
● 2e+05
● ● ●
6.0e+08 ●
● FAST 2e+02
Am−Filtering ● ● ●
● 1e+03
Rand−P−Greedy ●
1e+05 ●
3.0e+08 ● Bin−Search−Max 1e+02 ●
Parallel−Greedy ●
● ●
Parallel−LTLG
0.0e+00 Random 0e+00 0e+00
0e+00
40 80 120 160 50 100 150 200 50 100 150 200 100 200 300
k k k k
100.0
10.0
10.0
10.0 10.0
time (sec)
time (sec)
time (sec)
time (sec)
1.0
1.0
1.0 0.5 1.0
0.5
0.5 0.5
0.1 0.1 ●
0.1 0.1 ●
● ● ●
● ●
● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ●
● ●
● ● ● ● ●
40 80 120 160 50 100 150 200 50 100 150 200 100 200 300
k k k k
Figure 2. Experiment Set 1.b: FAST (blue) vs. low-adaptivity algorithms and PARALLEL -LTLG on real data (time axis log-scaled).
because FAST samples elements, not sets of elements like We also compare these low-adaptivity algorithms to an op-
previous algorithms. Despite the large advantage this setup timized parallel MPI implementation of L AZIER - THAN -
gives to the previous low-adaptivity algorithms, FAST is L AZY-G REEDY (LTLG) (Mirzasoleiman et al., 2015) (see
consistently one to three orders of magnitude faster. Appendix C.11). LTLG is widely regarded as the fastest
algorithm for submodular maximization in practice, and it
The second set of experiments compares FAST to
has a (1 1/e ") approximation guarantee in expectation.
PARALLEL -L AZIER - THAN -L AZY-G REEDY (PARALLEL -
LTLG) on large-scale data sets. We scale up the 8 objec- For calibration, we also ran (1) PARALLEL -G REEDY, a
tives to be defined on synthetic data with n = 100000 and parallel version of the standard G REEDY algorithm, as a
real data with up to n = 26000 and various k ranging from heuristic upper bound for the objective value, as well as
k = 25 to k = 25000. We find that FAST is consistently (2) R ANDOM, an algorithm that simply selects k elements
1.5 to 27 times faster than PARALLEL -LTLG, and its run- uniformly at random.
time advantage increases in k. These fast relative runtimes
A fair comparison of the low-adaptivity algorithms’ paral-
are a loose lower bound on FAST’s performance advantage,
lel runtimes and solution values is to run each algorithm
as FAST can reap additional speedups by adding up to n
with parameters that yield the same guarantees, for exam-
processors, whereas PARALLEL -LTLG performs at most
ple a 1 1/e " approximation w.p. 1 with " = 0.1 and
n log(" 1 )/k function evaluations per iteration, so using
= 0.05. However, this is infeasible since the other low-
over 95 processors often does not help. In Section 4.1, we
adaptivity algorithms all require a practically intractable
show that on many objectives FAST is faster even with only
number of queries to achieve any reasonable guarantees,
a single processor.
e.g. every round of A MORTIZED -F ILTERING would re-
quire at least 108 samples, even with n = 500.
4.1. Experiments set 1: FAST vs. low-adaptivity
algorithms
Dealing with benchmarks’ practically intractable query
Our first set of experiments compares FAST to state-of- complexity. To run other low-adaptivity algorithms de-
the-art low-adaptivity algorithms. To accomplish this, spite their huge sample complexity we made two major
we built optimized parallel MPI versions of each of modifications:
the following algorithms: R ANDOMIZED -PARALLEL -
G REEDY (Chekuri & Quanrud, 2019b), B INARY- 1. Accelerating subroutines. We optimize each of the
S EARCH -M AXIMIZATION (Fahrbach et al., 2019a), and three other low-adaptivity benchmarks by implement-
A MORTIZED -F ILTERING (Balkanski et al., 2019a). For ing parallel binary search to replace brute-force search
any given " > 0 all these algorithms achieve a 1 1/e " and several other modifications that reduce unnecessary
approximation in O(poly(" 1 ) log n) rounds. queries (for a full description of these fast implementa-
tions, see Appendix C.10). These optimizations result in
The FAST Algorithm for Submodular Maximization
objective value
7.5e+04
objective value
objective value
objective value
2e+04 6e+04 ●
●
● ● ●
●
5.0e+04 ● 4e+04 4e+04
● ●
●
1e+04 ● ●
●
● FAST 2.5e+04 ● 2e+04
2e+04 ● ●
Parallel−LTLG ●
●
●
● Random ●
●
●
● ●●
0e+00 0.0e+00 0e+00 ● 0e+00
0 2500 5000 7500 10000 0 250 500 750 1000 0 5000 10000 15000 20000 25000 0 2500 5000 7500 10000 12500
k k k k
1,500 600
200 100
1,000 400
time (sec)
time (sec)
time (sec)
time (sec)
100 50 500 200
●
●
●
●
● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
0 ● ● ● ●● ● ● 0 ●●● ● ● 0 ● ● ●
0
0 2500 5000 7500 10000 0 250 500 750 1000 0 5000 10000 15000 20000 25000 0 2500 5000 7500 10000 12500
k k k k
Figure 3. Experiment Set 2.a: FAST (blue) vs. PARALLEL -LTLG (red) on graphs.
speedups that reduce their runtimes by an order of mag- • Experiments 1.a: synthetic data sets (n ⇡ 500).
nitude in practice, and our implementations are pub- To compare the algorithms’ runtimes under a range of
licly available in our code base. Despite this, it remains conditions, we solve max cover on synthetic graphs
practically infeasible to compute these algorithms’ high generated via four different well-studied graph mod-
number of samples in practice even on small problems els: Stochastic Block Model (SBM); Erdős Rényi (ER);
(e.g. n = 500 elements); Watts-Strogatz (WS); and Barbási-Albert (BA). See Ap-
pendix C.4.1 for additional details;
2. Using a single query per processor. Since our inter-
est is in comparing runtime and not quality of approxi- • Experiments 1.b: real data sets (n ⇡ 500). To com-
mation, we dramatically lowered the number of queries pare the algorithms’ runtimes on real data, we opti-
the three benchmark algorithms require to achieve their mize Sensor Placement on California roadway traffic
guarantees. Specifically, we set the parameters " and data; Movie Recommendation on MovieLens data; Rev-
for both FAST and the three low-adaptivity bench- enue Maximization on YouTube Network data; and In-
marks such that all algorithms guarantee the same 1 fluence Maximization on Facebook Network data. See
1/e 0.1 approximation with probability 0.95 (see Ap- Appendix C.4.3 for additional details.
pendix C.3). However, for the low-adaptivity bench-
marks, we reduce their theoretical sample complexity
in each round to have exactly one sample per proces- Results of experiment set 1. Figures 1 and 2 plot all al-
sor (instead of their large sample complexity, e.g. 108 gorithms’ solution values and parallel runtimes for various
samples needed for A MORTIZED -F ILTERING). k on synthetic and real data (each point is the mean of 5 tri-
als with the corresponding k). In terms of solution values,
This reduction in the number of samples per round al- across all experiments, values obtained by FAST are nearly
lows the benchmarks to have each processor perform indistinguishable from values obtained by G REEDY—the
a single function evaluation per round instead of e.g. heuristic upper bound. From this comparison, it is clear that
108 /95 functions evaluations per processor per round, FAST does not compromise on the values of its solutions.
which ‘unfairly’ accelerates their runtimes at the ex- In terms of runtime, FAST is 36 to 1600 times faster than
pense of their approximations. However, we do not per- B INARY-S EARCH -M AXIMIZATION; 7 to 120 times faster
form this reduction for FAST. Instead, we require FAST than R ANDOMIZED -PARALLEL -G REEDY; 4 to 2200 times
to compute the full count of samples for its guarantees. faster than A MORTIZED -F ILTERING; and 1.1 to 7 times
This is feasible since FAST samples elements rather than faster than PARALLEL -LTLG on the 8 objectives and var-
sets. ious k (the time axes of Figures 1 and 2 are log-scaled).
Appendix C.12 shows that FAST continues to outperform
Data sets. Even with these modifications, for tractability all benchmarks even (1) when we turn off its lazy updates,
we could only use small data sets: and (2) when we run all low-adaptivity benchmarks on just
The FAST Algorithm for Submodular Maximization
Traffic Network: n=1885 Movies: n=3706 YouTube: n=17,432 Influence Max: n=26,588
1.25e+07
● ● ● ●
● ●
● ●
● 1.00e+07 ●
● 7.5e+04 9e+03
4e+09 ● ● ●
objective value
objective value
objective value
objective value
● ●
● ●
7.50e+06 ●
●
● 5.0e+04 ● 6e+03
●
● ●
● 5.00e+06 ●
2e+09
●
● ●
● FAST ● 2.5e+04 3e+03
2.50e+06 ●
●
Parallel−LTLG ●
● ●
● Random ●
●
0e+00 0.00e+00 0.0e+00 0e+00
250 500 750 100 200 300 400 500 250 500 750 1000 0 2500 5000 7500 10000
k k k k
1.00
50 250
200
40 200
0.75
150
time (sec)
time (sec)
time (sec)
time (sec)
30 150
0.50 100
20 100
●
0.25 ● 50
10 ● ● 50
●
● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
0 ● ● 0 ● ● 0 ● ● ●
250 500 750 100 200 300 400 500 250 500 750 1000 0 2500 5000 7500 10000
k k k k
Figure 4. Experiment Set 2.b: FAST (blue) vs. PARALLEL -LTLG (red) on real data.
a single ‘good’ guess for OPT. We emphasize that FAST’s in larger k, so larger problems exhibit even greater runtime
faster runtimes were obtained despite the fact that the three advantages for FAST.
other low-adaptivity algorithms were run with only a single
Furthermore, we emphasize that due to the fact that the
sample per processor each iteration, rather than the 108 or 1 Processor Runtime: YouTube
sample complexity of PARALLEL -LTLG is less than 95
106 samples required for their respective guarantees.
for many experiments, it cannot achieve better runtimes by 1,500
time (sec)
1,000
Parallel-Lazier-than-Lazy-Greedy FAST’s fast relative runtimes are a loose lower bound for
Our second set of experiments compares FAST to the op- what can be obtained on larger-scale hardware and prob- 500
● FAST
timized parallel version of L AZIER - THAN -L AZY-G REEDY lems. Figure 5 plots FAST’s parallel speedups versus the Parallel−LTLG
● ●
100
● ●
20 ●
1,000 ●
20 ●
lution values and runtimes for various k on large experi- Figure 5. Single processor runtimes for FAST and PARALLEL -
ments with synthetic and real data (each point is the mean LTLG, and parallel speedups vs. number of processors for FAST
15
of 5 trials). In terms of solution values, while the two algo- for the YouTube experiment. See Appendix C.14 for details.
●
iments, FAST obtained slightly higher solution values than Finally, we note that even on a single processor, FAST is ●
objectives due to the fact that FAST often uses fewer queries
●
12 4 8 16 32
PARALLEL -LTLG on each of the 8 objectives and all k we (see Appendix C.13). For example, Figure 5 plots single
processors
tried from k = 25 to 25000. More importantly, runtime processor runtimes for the YouTube experiment.
disparities between FAST and PARALLEL -LTLG increase
The FAST Algorithm for Submodular Maximization
Balkanski, E., Rubinstein, A., and Singer, Y. An opti- Hassidim, A. and Singer, Y. Robust guarantees of stochas-
mal approximation for submodular maximization under tic greedy algorithms. In Proceedings of the 34th Inter-
a matroid constraint in the adaptive complexity model. national Conference on Machine Learning-Volume 70,
STOC, 2019b. pp. 1424–1432. JMLR. org, 2017.
CalTrans. Pems: California performance measuring sys- Kazemi, E., Mitrovic, M., Zadimoghaddam, M., Lattanzi,
tem. http://pems.dot.ca.gov/ [accessed: Au- S., and Karbasi, A. Submodular streaming in all its
gust 1, 2019]. glory: Tight approximation, minimum memory and low
adaptive complexity. arXiv preprint arXiv:1905.00948,
Chekuri, C. and Quanrud, K. Parallelizing greedy for sub-
2019.
modular set function maximization in matroids and be-
yond. STOC, 2019a. Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., Van-
Briesen, J. M., and Glance, N. S. Cost-effective out-
Chekuri, C. and Quanrud, K. Submodular function
break detection in networks. In Proceedings of the 13th
maximization in parallel via the multilinear relaxation.
ACM SIGKDD International Conference on Knowledge
SODA, 2019b.
Discovery and Data Mining, San Jose, California, USA,
Chen, L., Feldman, M., and Karbasi, A. Unconstrained August 12-15, 2007, pp. 420–429, 2007. doi: 10.1145/
submodular maximization with constant adaptive com- 1281192.1281239. URL https://doi.org/10.
plexity. STOC, 2019. 1145/1281192.1281239.
Ene, A. and Nguyen, H. L. Submodular maximization with Minoux, M. Accelerated greedy algorithms for maximizing
nearly-optimal approximation and adaptivity in nearly- submodular set functions. In Optimization techniques,
linear time. SODA, 2019a. pp. 234–243. Springer, 1978.
The FAST Algorithm for Submodular Maximization