0% found this document useful (0 votes)
64 views10 pages

FAST Algorithm For Submodular Maximization

The paper presents the Fast Adaptive Sequencing Technique (FAST), a new parallel algorithm for maximizing monotone submodular functions under cardinality constraints, achieving an optimal 1 - 1/e approximation guarantee. FAST significantly outperforms existing algorithms in terms of practical runtime and query complexity, achieving O(log(n) log2(log k)) adaptivity and using O(n log log(k)) total queries. The algorithm's design focuses on efficiency and practical applicability, making it suitable for large datasets.

Uploaded by

foryou97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views10 pages

FAST Algorithm For Submodular Maximization

The paper presents the Fast Adaptive Sequencing Technique (FAST), a new parallel algorithm for maximizing monotone submodular functions under cardinality constraints, achieving an optimal 1 - 1/e approximation guarantee. FAST significantly outperforms existing algorithms in terms of practical runtime and query complexity, achieving O(log(n) log2(log k)) adaptivity and using O(n log log(k)) total queries. The algorithm's design focuses on efficiency and practical applicability, making it suitable for large datasets.

Uploaded by

foryou97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

The FAST Algorithm for Submodular Maximization

Adam Breuer 1 Eric Balkanski 1 Yaron Singer 1

Abstract 1. Introduction
In this paper we describe a new parallel al- In this paper we describe a fast parallel algorithm for sub-
gorithm called Fast Adaptive Sequencing Tech- modular maximization.1 Informally, a function is submod-
nique (FAST) for maximizing a monotone sub- ular if it exhibits a natural diminishing returns property. For
modular function under a cardinality constraint the canonical problem of maximizing a monotone submod-
k. This algorithm achieves the optimal 1 1/e ular function under a cardinality constraint, it is well known
approximation guarantee and is orders of mag- that the greedy algorithm, which iteratively adds elements
nitude faster than the state-of-the-art on a variety whose marginal contribution is largest to the solution,
of experiments over real-world data sets. Follow- obtains a 1 1/e approximation guarantee (Nemhauser
ing recent work by Balkanski & Singer (2018a), et al., 1978) which is optimal for polynomial-time algo-
there has been a great deal of research on algo- rithms (Nemhauser & Wolsey, 1978). The greedy algo-
rithms whose theoretical parallel runtime is ex- rithm and other submodular maximization techniques are
ponentially faster than algorithms used for sub- heavily used in machine learning and data mining as many
modular maximization over the past 40 years. fundamental objectives such as entropy, mutual informa-
However, while these new algorithms are fast in tion, graphs cuts, diversity, and set cover are submodular.
terms of asymptotic worst-case guarantees, it is In recent years there has been a great deal of progress on
computationally infeasible to use them in prac- fast algorithms for submodular maximization designed to
tice even on small data sets because the num- accelerate computation on large data sets. The first line
ber of rounds and queries they require depend on of work considers serial algorithms where queries can be
large constants and high-degree polynomials in evaluated on a single processor (Leskovec et al., 2007;
terms of precision and confidence. The design Badanidiyuru & Vondrák, 2014; Mirzasoleiman et al.,
principles behind the FAST algorithm we present 2015; 2016; Ene & Nguyen, 2019b;c). For serial algo-
here are a significant departure from those of re- rithms the state-of-the-art for maximization under a car-
cent theoretically fast algorithms. Rather than dinality constraint is the lazier-than-lazy-greedy (LTLG)
optimize for asymptotic theoretical guarantees, algorithm which returns a solution that is in expectation ar-
the design of FAST introduces several new tech- bitrarily close to the optimal 1 1/e and does so in a linear
niques that achieve remarkable practical and the- number of queries (Mirzasoleiman et al., 2015). This al-
oretical parallel runtimes. The approximation gorithm is a stochastic greedy algorithm coupled with lazy
guarantee obtained by FAST is arbitrarily close updates, which not only performs well in terms of the qual-
to 1 1/e, and its asymptotic parallel run- ity of the solution it returns, but is also very fast in practice.
time (adaptivity) is O(log(n) log2 (log k)) using
O(n log log(k)) total queries. We show that Accelerating computation beyond linear runtime requires
FAST is orders of magnitude faster than any algo- parallelization. The parallel runtime of blackbox optimiza-
rithm for submodular maximization we are aware tion is measured by adaptivity, which is the number of se-
of, including hyper-optimized parallel versions quential rounds an algorithm requires when polynomially-
of state-of-the-art serial algorithms, by running many queries can be executed in parallel in every round.
experiments on large data sets. For maximizing a submodular function defined over a
ground set of n elements under a cardinality constraint k,
1
Harvard University, Cambridge, MA. Correspon- the adaptivity of the naive greedy algorithm is O(k), which
dence to: Adam Breuer <[email protected]>, in the worst case is O(n). Until recently no algorithm was
Eric Balkanski <[email protected]>, Yaron Singer
<[email protected]>. known to have better parallel runtime than that of greedy.
A very recent line of work initiated by Balkanski &
Proceedings of the 37 th International Conference on Machine
Learning, Vienna, Austria, PMLR 119, 2020. Copyright 2020 by 1
Code is available from www.adambreuer.com/code.
the author(s).
The FAST Algorithm for Submodular Maximization

Algorithm rounds queries time (sec)


A MORTIZED -F ILTERING 961 2124351 20.29
(Balkanski et al., 2019a)
B INARY-S EARCH -M AXIMIZATION 8744 2552028 24.64
(Fahrbach et al., 2019a)
R ANDOMIZED -PARALLEL -G REEDY 92 148642 4.11
(Chekuri & Quanrud, 2019b)
PARALLEL -LTLG 200 856 0.15
(Mirzasoleiman et al., 2015)
FAST 9 1598 0.033

Singer (2018a) develops techniques for designing constant 1.2. Our contribution
approximation algorithms for submodular maximization
In this paper we design a new algorithm called Fast Adap-
whose parallel runtime is logarithmic (Balkanski & Singer,
tive Sequencing Technique (FAST) for maximizing a mono-
2018b; Balkanski et al., 2018; Ene & Nguyen, 2019a;
tone submodular function under a cardinality constraint k.
Fahrbach et al., 2019a;b; Kazemi et al., 2019; Chekuri &
FAST has an approximation ratio that is arbitrarily close to
Quanrud, 2019a;b; Balkanski et al., 2019a;b; Ene et al.,
1 1/e, is O(log(n) log2 (log k)) adaptive, and uses a total
2019; Chen et al., 2019; Esfandiari et al., 2019; Qian &
of O(n log log(k)) queries. The main contribution is not in
Singer, 2019). In particular, Balkanski & Singer (2018a)
the algorithm’s asymptotic guarantees, but in its design that
describe a technique called adaptive sampling that obtains
is extremely efficient both in terms of its non-asymptotic
in O(log n) rounds a 1/3 approximation for maximizing
worst case query complexity and number of rounds, and
a monotone submodular function under a cardinality con-
in terms of its practical runtime. In terms of actual query
straint. This technique can be used to obtain an approxi-
complexity and practical runtime, this algorithm outper-
mation arbitrarily close to the optimal 1 1/e in O(log n)
forms all algorithms for submodular maximization we are
rounds (Balkanski et al., 2019a; Ene & Nguyen, 2019a).
aware of, including hyper-optimized versions of LTLG. To
be more concrete, we give a brief experimental compari-
1.1. From theory to practice
son in the table above for a movie recommendation objec-
The focus of the work on adaptivity described above has tive on n = 500 movies against optimized implementations
largely been on conceptual and theoretical contributions: of algorithms with the same adaptivity and approximation
achieving strong approximation guarantees under various (experiment details in Section 4).2
constraints with runtimes that are exponentially faster un- FAST achieves its speedup by thoughtful design that results
der worst case theoretical analysis. From a practitioner’s in frugal worst case query complexity as well as several
perspective however, even the state-of-the-art algorithms in heuristics used for practical speedups. From a purely ana-
this genre are infeasible for large data sets. The logarithmic lytical perspective, FAST improves the " dependency in the
parallel runtime of these algorithms carries extremely large linear term of the query complexity of at least Õ(" 5 n)
constants and polynomial dependencies on precision and in Balkanski et al. (2019a) and Ene & Nguyen (2019a)
confidence parameters that are hidden in their asymptotic and Õ(" 3 n) in Fahrbach et al. (2019a) to Õ(" 2 n). We
analysis. In terms of sample complexity alone, obtaining provide the first non-asymptotic bounds on the query and
(for example) a 1 1/e 0.1 approximation with 95% con- adaptive complexity of an algorithm with sublinear adap-
fidence for maximizing a submodular function under cardi- tivity, showing dependency on small constants. Our algo-
nality constraint k requires evaluating at least 108 (Balka- rithm uses adaptive sequencing (Balkanski et al., 2019b)
nski et al., 2019a) or 106 (Fahrbach et al., 2019a; Chekuri and multiple optimizations to improve the query complex-
& Quanrud, 2019b) samples of sets of size approximately ity and runtime.
log n in every round. Even if one heuristically uses a single
k

sample in every round, other sources of inefficiencies that


1.3. Paper organization
we discuss throughout the paper prevent these algorithms
from being applied even on moderate-sized data sets. The We introduce the main ideas and decisions behind the de-
question is then whether the plethora of breakthrough tech- sign of FAST in Section 2. We describe and analyze guar-
niques in this line of work of exponentially faster algo- antees in Section 3. We discuss experiments in Section 4.
rithms for submodular maximization can lead to algorithms 2
that are fast in practice for large problem instances. To obtain these values, we set all algorithms to guarantee a
1 1/e 0.1 approximation with probability 0.95 except LTLG,
which has this guarantee in expectation, and we set k = 200.
The FAST Algorithm for Submodular Maximization

2. FAST Overview Negotiating the adaptive complexity with the query


complexity. The vanilla version of our algorithm, whose
Before describing the algorithm, we give an overview of description and analysis are in Appendix A, has at most
the major ideas and discuss how they circumvent the bottle- " 2 log n adaptive rounds and uses a total of " 2 nk queries
necks for practical implementation of existing logarithmic to obtain a 1 1/e 32 " approximation, without additional
adaptivity algorithms. dependence on constants or lower order terms. In our ac-
tual algorithm, we trade a small factor in adaptive complex-
Adaptive sequencing vs. adaptive sampling. The large ity for a substantial improvement in query complexity. We
majority of low-adaptivity algorithms use adaptive sam- do this in the following manner:
pling (Balkanski & Singer, 2018a; Ene & Nguyen, 2019a;
Fahrbach et al., 2019a;b; Balkanski et al., 2018; 2019a;
Kazemi et al., 2019), a technique introduced in Balkanski • Search for estimates of OPT. All algorithms with
& Singer (2018a). These algorithms sample a large num- logarithmic adaptivity require a good estimate of OPT,
ber of sets of elements at every iteration to estimate (1) the which can be obtained by running " 1 log k instances
expected marginal contribution of a random set R to the of the algorithms with different guesses of OPT in par-
current solution S and (2) the expected marginal contribu- allel, so that one guess is guaranteed to be a good ap-
tions of each element a to R [ S. These estimates, which proximation to OPT.3 We accelerate this search by
rely on concentration arguments, are then used to either add binary searching over the guesses of OPT. A main
a random set R to S or discard elements with low expected difficulty when using this binary search is that the ap-
marginal contribution to R [ S. proximation guarantee of the solution obtained with
each guess of OPT needs to hold with high probabil-
In contrast, the adaptive sequencing technique which was ity, instead of in expectation, to obtain any guarantee
recently introduced in Balkanski et al. (2019b) generates for the global solution.
at every iteration a single random sequence (a1 , . . . , a|X| )
of the elements X not yet discarded. A prefix Ai? = Even though the guarantees on the marginal contribu-
(a1 , . . . , ai? ) of the sequence is then added to the solution tions obtained from each element added to the solution
S, where i? is the largest position i such that a large frac- only hold in expectation for adaptive sequencing, we
tion of the elements in X have high marginal contribution obtain high probability guarantees for the global solu-
to S [ Ai 1 . Elements with low marginal contribution to tion by generalizing the robust guarantees obtained in
the new solution S are then discarded from X. Hassidim & Singer (2017) so that they also apply to
adaptive sequencing. In the practical speedups below,
The first choice we made was to use an adaptive sequencing we discuss how we often only need a single iteration
technique rather than adaptive sampling. of this binary search in practice;

• Dependence on large polynomials in ". Adaptive • Search for position i? . To find the position i? , which
sampling algorithms crucially rely on sampling, and is the largest position i 2 [k] in the sequence such that
as a result their query complexity has high polynomial a large fraction of not-yet-discarded elements have
dependency on " (e.g. at least O(" 5 n) in Balkanski high marginal contribution to S [ Ai 1 , the vanilla
et al. (2019a) and Ene & Nguyen (2019a)). Due to adaptive sequencing technique queries the marginal
these " dependencies, the query complexity blows up contribution of all elements in X at each of the k po-
with any reasonable value for ". In contrast, adaptive sitions. This search for i? causes the O(nk) query
sequencing generates a single random sequence at ev- complexity.
ery iteration. Therefore, in the term that is linear in n Instead, similarly to the search of OPT, we binary
we can obtain an " dependence that is only Õ(" 2 ). search over a set of " 1 log k geometrically increas-
• Dependence on large constants. The asymptotic ing values i that correspond to guesses of i? . This
query complexity of previous algorithms depends on improves the O(nk) dependency on n and k in the
very large constants (e.g. at least 60000 in Balkan- query complexity to O(n log(log k)). Then, at any
ski et al. (2019a) and Ene & Nguyen (2019a)) mak- step of the binary search over a position i, instead of
ing them impractical. As we tried to optimize con- evaluating the marginal contribution of all elements
stants for adaptive sampling, we found that due to the in X to S [ Ai 1 , we only evaluate a small sample
sampling and the requirement to maintain strong the- of elements. In the practical speedups below, we dis-
oretical guarantees, the constants cascade and grow cuss how we can often skip this binary search for i? in
through multiple parts of the analysis. In principle, practice.
adaptive sequencing does not rely on sampling, which 3
Fahrbach et al. (2019a) do some preprocessing to estimate
dramatically reduces its dependency on constants. OPT, but it is estimated within some very large constant.
The FAST Algorithm for Submodular Maximization

Practical speedups. We include several ideas which re- 3. The Algorithm


sult in considerable speedups in practice without sacrificing
approximation, adaptivity, or query complexity guarantees: We describe the FAST-F ULL algorithm (Algorithm 1). The
main part of the algorithm is the FAST subroutine (Algo-
rithm 2), which is instantiated with different guesses of
• Preprocessing the sequence. At the outset of each it- OPT. These guesses v 2 V of OPT are geometrically
P in-
eration of the algorithm, before searching for a prefix creasing from maxa2N f (a) to max|S|k a2S f (a) by a
Ai? to add to the solution S, we first use a prepro- (1 ") 1 factor, so V contains a value that is a 1 " ap-
cessing step that adds high value elements from the proximation to OPT. The algorithm binary searches over
sequence to S. Specifically, we add to the solution S guesses for the largest guess v that obtains a solution S that
all sequence elements ai that have high marginal con- is a 1 1/e approximation to v.
tribution to S [ Ai 1 .
Algorithm 1 FAST-F ULL: the full algorithm
After adding these high-value elements, we discard
surviving elements in X that have low marginal con- input function f , cardinality constraintPk, parameter "
tribution to the new solution S. In the case where this V G EOM(maxa f (a), max|S|k a2S f (a), 1 ")
step discards a large fraction of surviving elements v? B-S EARCH(max{v 2 V : f (Sv ) (1 1/e)v})
from X, we can also skip this iteration’s binary search where Sv FAST(v)
for i? and continue to the next iteration without adding return Sv?
a prefix to S;
FAST generates at every iteration a uniformly random se-
• Number of elements added per iteration. An adap- quence a1 , . . . , a|X| of the elements X not yet discarded.
tive sampling algorithm which samples sets of size s After the preprocessing step which adds to S elements
adds at most s elements to the current solution at each guaranteed to have high marginal contribution, the algo-
iteration. In contrast, adaptive sequencing and the pre- rithm identifies a position i? in this sequence which deter-
processing step described above often allow our algo- mines the prefix Ai? 1 that is added to the current solution
rithm to add a very large number of elements to the S. Position i? is defined as the largest position such that
current solution at each iteration in practice; there is a large fraction of elements in X with high con-
tribution to S [ Ai? 1 . To find i? , we binary search over
• Single iteration of the binary search for OPT. Even geometrically increasing positions i 2 I ✓ [k]. At each
with binary search, running multiple instances of the position i, we only evaluate the contributions of elements
algorithm with different guesses of OPT is undesir- a 2 R , where R is a uniformly random subset of X of size
able. We describe a technique that often needs only m, instead of all elements X.
a single guess
P of OPT. This guess is the sum v =
max|S|k a2S f (a) of the k highest valued single- Algorithm 2 FAST: the Fast Adaptive Sequencing Tech-
tons, which is an upper bound on OPT. If the solu- nique algorithm
tion S obtained with that guess v has value f (S) input f , constraint k, guess v for OPT, parameter "
(1 1/e)v, then, since v OPT, S is guaranteed S ;
to obtain a 1 1/e approximation and the algorithm while |S| < k and number of iterations < " 1 do
does not need to continue the binary search. Note that X N, t (1 ")(v f (S))/k
with a single guess of OPT, the robust guarantees for while X 6= ; and |S| < k do
the binary search are not needed, which improves the a1 , . . . , a|X| S EQUENCE(X, |X|)
sample complexity to m = "2 (12+"
3") log(2
1
); Ai a1 , . . . , ai
S S [ {ai : fS[Ai 1 (ai ) t}
• Lazy updates. There are many situations where X0 {a 2 X : fS (a) t}
lazy evaluations of marginal contributions can be per- if |X0 |  (1 ")|X| then
formed (Minoux, 1978; Mirzasoleiman et al., 2015). X X0 and continue to next iteration
Since we never discard elements from the solution S, R S AMPLE(X, m),
the marginal contributions of elements a to S are non- I G EOM(1, k |S|, 1 ")
increasing at every iteration by submodularity. Ele- Ri a 2 R : fS[Ai 1 (a) t , for i 2 I
ments with low marginal contribution c to the current i? B-S EARCH(max{i : |Ri | (1 2")|R|})
solution at some iteration are ignored until the thresh- S S [ Ai ?
old t is lowered to t  c. Lazy updates also accelerate return S
the binary search over i? .
The FAST Algorithm for Submodular Maximization

SBM Graph: n=515 ER Graph: n=500 WS Graph: n=500 BA Graph: n=500


● ●
● 1.5e+02 ● ●
3e+02
● ● ●

3e+02 ●
● ●

objective value
objective value

objective value

objective value
1e+02 ● ●

● ●
● 2e+02 1.0e+02

● ● 2e+02
● FAST ●

● Am−Filtering ●

5e+01
● Rand−P−Greedy ● 5.0e+01
1e+02
Bin−Search−Max ● 1e+02
Parallel−Greedy
Parallel−LTLG
0e+00 0.0e+00
Random 0e+00 0e+00

10 20 30 40 10 20 30 40 10 20 30 40 10 20 30 40
k k k k

20.0 20.0
15.0 15.0
10.0 10.0
5.0 5.0
1.0 1.0

0.5 0.5
time (sec)

time (sec)

time (sec)

time (sec)
1.0 1.0
0.5 0.5

0.1 0.1

0.1 0.1


● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ●

10 20 30 40 10 20 30 40 10 20 30 40 10 20 30 40
k k k k

Figure 1. Experiment Set 1.a: FAST (blue) vs. low-adaptivity algorithms and PARALLEL -LTLG on graphs (time axis log-scaled).

3.1. Analysis the query complexity (Lemma 7), we note that there are
|X| + m` function evaluations per iteration.
We show that FAST obtains a 1 1/e " approximation
w.p. 1 and that it has Õ(" 2 log n) adaptive complexity
and Õ(" n + " 4 log(n) log( 1 )) query complexity.
2 4. Experiments
2 log(2 1 `) Our goal in this section is to show that in practice, FAST
Theorem 1. Assume k and " 2 (0, 0.1),
"2 (1 5") finds solutions whose value meets or exceeds alternatives
where ` = log( log" k ). Then, FAST with sample complexity in less parallel runtime than both state-of-the-art low-
4` log n
"2 ) has at most " log(n)`2 adap-
2+" 2
m = "2 (1 3") log( adaptivity algorithms and L AZIER - THAN -L AZY-G REEDY
tive rounds, 2" `n+" log(n)` m queries, and achieves
2 4 2 (LTLG). To accomplish this, we build optimized parallel
a 1 1e 4" approximation with probability 1 . MPI implementations of FAST, other low-adaptivity algo-
rithms, and LTLG, which is widely regarded as the fastest
We defer the analysis to Appendix B. The main part of it algorithm for submodular maximization in practice. We
is for the approximation guarantee, which consists of two then use 95 Intel Skylake-SP 3.1 GHz processors on AWS
cases depending on the condition which breaks the outer- to compare the algorithms’ runtime over a variety of objec-
loop. Lemma 3 shows that when there are " 1 iterations tives defined on 8 real and synthetic datasets. We measure
of the outer-loop, the set of elements added to S at every runtime using a rigorous measure of parallel time (see Ap-
iteration of the outer-loop contributes " 1 (OPT f (S)). pendix C.8). Appendices C.1, C.4, C.9, and C.6 contain
Lemma 5 shows that for the case where |S| = k, the ex- detailed descriptions of the benchmarks, objectives, imple-
pected contribution of each element ai added to S is arbi- mentations, hardware, and experimental setup on AWS.5
trarily close to (OPT f (S))/k. For each solution Sv , we We conduct two sets of experiments. The first set com-
need the approximation guarantee to hold with high prob- pares FAST to previous low-adaptivity algorithms on 8 ob-
ability instead of in expectation to be able to binary search jectives. Since previous algorithms all have practically in-
over guesses for OPT, which we obtain in Lemma 7 by tractable sample complexity, we grossly reduce their sam-
generalizing the robust guarantees of Hassidim & Singer ple complexity to only 95 samples per iteration so that
(2017) in Lemma 6. The main observation to obtain the each processor performs a single function evaluation per
adaptive complexity (Lemma 6) is that, by definition of iteration. This reduction, which we discuss in detail be-
i? , at least an " fraction of the surviving elements in X low, gives these algorithms a large runtime advantage over
are discarded at every iteration with high probability.4 For FAST, which computes its full theoretical sample complex-
4
To obtain the adaptivity r with probability 1 and the approx- ity in all experiments. This is practically feasible for FAST
imation guarantee w.p. 1 , the algorithm declares failure after 5
Code is available from www.adambreuer.com/code.
r rounds and accounts for this failure probability in .
The FAST Algorithm for Submodular Maximization

Traffic Network: n=525 Movie Recommendation: n=500 YouTube: n=571 Influence Max: n=769
1.2e+09
4e+02
● ● ●
● 3e+05 ●
● ●
● ● ●
● ●
9.0e+08 2e+03 3e+02

objective value
objective value

objective value
objective value
● ●
● ●

● 2e+05
● ● ●
6.0e+08 ●
● FAST 2e+02
Am−Filtering ● ● ●
● 1e+03
Rand−P−Greedy ●
1e+05 ●
3.0e+08 ● Bin−Search−Max 1e+02 ●
Parallel−Greedy ●
● ●
Parallel−LTLG
0.0e+00 Random 0e+00 0e+00
0e+00
40 80 120 160 50 100 150 200 50 100 150 200 100 200 300
k k k k
100.0

10.0

10.0
10.0 10.0
time (sec)

time (sec)

time (sec)

time (sec)
1.0
1.0
1.0 0.5 1.0
0.5
0.5 0.5

0.1 0.1 ●
0.1 0.1 ●
● ● ●
● ●
● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ●
● ●
● ● ● ● ●

40 80 120 160 50 100 150 200 50 100 150 200 100 200 300
k k k k

Figure 2. Experiment Set 1.b: FAST (blue) vs. low-adaptivity algorithms and PARALLEL -LTLG on real data (time axis log-scaled).

because FAST samples elements, not sets of elements like We also compare these low-adaptivity algorithms to an op-
previous algorithms. Despite the large advantage this setup timized parallel MPI implementation of L AZIER - THAN -
gives to the previous low-adaptivity algorithms, FAST is L AZY-G REEDY (LTLG) (Mirzasoleiman et al., 2015) (see
consistently one to three orders of magnitude faster. Appendix C.11). LTLG is widely regarded as the fastest
algorithm for submodular maximization in practice, and it
The second set of experiments compares FAST to
has a (1 1/e ") approximation guarantee in expectation.
PARALLEL -L AZIER - THAN -L AZY-G REEDY (PARALLEL -
LTLG) on large-scale data sets. We scale up the 8 objec- For calibration, we also ran (1) PARALLEL -G REEDY, a
tives to be defined on synthetic data with n = 100000 and parallel version of the standard G REEDY algorithm, as a
real data with up to n = 26000 and various k ranging from heuristic upper bound for the objective value, as well as
k = 25 to k = 25000. We find that FAST is consistently (2) R ANDOM, an algorithm that simply selects k elements
1.5 to 27 times faster than PARALLEL -LTLG, and its run- uniformly at random.
time advantage increases in k. These fast relative runtimes
A fair comparison of the low-adaptivity algorithms’ paral-
are a loose lower bound on FAST’s performance advantage,
lel runtimes and solution values is to run each algorithm
as FAST can reap additional speedups by adding up to n
with parameters that yield the same guarantees, for exam-
processors, whereas PARALLEL -LTLG performs at most
ple a 1 1/e " approximation w.p. 1 with " = 0.1 and
n log(" 1 )/k function evaluations per iteration, so using
= 0.05. However, this is infeasible since the other low-
over 95 processors often does not help. In Section 4.1, we
adaptivity algorithms all require a practically intractable
show that on many objectives FAST is faster even with only
number of queries to achieve any reasonable guarantees,
a single processor.
e.g. every round of A MORTIZED -F ILTERING would re-
quire at least 108 samples, even with n = 500.
4.1. Experiments set 1: FAST vs. low-adaptivity
algorithms
Dealing with benchmarks’ practically intractable query
Our first set of experiments compares FAST to state-of- complexity. To run other low-adaptivity algorithms de-
the-art low-adaptivity algorithms. To accomplish this, spite their huge sample complexity we made two major
we built optimized parallel MPI versions of each of modifications:
the following algorithms: R ANDOMIZED -PARALLEL -
G REEDY (Chekuri & Quanrud, 2019b), B INARY- 1. Accelerating subroutines. We optimize each of the
S EARCH -M AXIMIZATION (Fahrbach et al., 2019a), and three other low-adaptivity benchmarks by implement-
A MORTIZED -F ILTERING (Balkanski et al., 2019a). For ing parallel binary search to replace brute-force search
any given " > 0 all these algorithms achieve a 1 1/e " and several other modifications that reduce unnecessary
approximation in O(poly(" 1 ) log n) rounds. queries (for a full description of these fast implementa-
tions, see Appendix C.10). These optimizations result in
The FAST Algorithm for Submodular Maximization

SBM Graph: n=48,000 ER Graph: n=100,000 WS Graph: n=100,000 BA Graph: n=100,000


1.0e+05 ● 8e+04
● ● 8e+04 ● ●


● ●

● 6e+04

objective value
7.5e+04
objective value

objective value

objective value
2e+04 6e+04 ●

● ● ●

5.0e+04 ● 4e+04 4e+04
● ●

1e+04 ● ●

● FAST 2.5e+04 ● 2e+04
2e+04 ● ●
Parallel−LTLG ●


● Random ●


● ●●
0e+00 0.0e+00 0e+00 ● 0e+00
0 2500 5000 7500 10000 0 250 500 750 1000 0 5000 10000 15000 20000 25000 0 2500 5000 7500 10000 12500
k k k k

1,500 600

200 100
1,000 400
time (sec)

time (sec)

time (sec)

time (sec)
100 50 500 200





● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
0 ● ● ● ●● ● ● 0 ●●● ● ● 0 ● ● ●
0
0 2500 5000 7500 10000 0 250 500 750 1000 0 5000 10000 15000 20000 25000 0 2500 5000 7500 10000 12500
k k k k

Figure 3. Experiment Set 2.a: FAST (blue) vs. PARALLEL -LTLG (red) on graphs.

speedups that reduce their runtimes by an order of mag- • Experiments 1.a: synthetic data sets (n ⇡ 500).
nitude in practice, and our implementations are pub- To compare the algorithms’ runtimes under a range of
licly available in our code base. Despite this, it remains conditions, we solve max cover on synthetic graphs
practically infeasible to compute these algorithms’ high generated via four different well-studied graph mod-
number of samples in practice even on small problems els: Stochastic Block Model (SBM); Erdős Rényi (ER);
(e.g. n = 500 elements); Watts-Strogatz (WS); and Barbási-Albert (BA). See Ap-
pendix C.4.1 for additional details;
2. Using a single query per processor. Since our inter-
est is in comparing runtime and not quality of approxi- • Experiments 1.b: real data sets (n ⇡ 500). To com-
mation, we dramatically lowered the number of queries pare the algorithms’ runtimes on real data, we opti-
the three benchmark algorithms require to achieve their mize Sensor Placement on California roadway traffic
guarantees. Specifically, we set the parameters " and data; Movie Recommendation on MovieLens data; Rev-
for both FAST and the three low-adaptivity bench- enue Maximization on YouTube Network data; and In-
marks such that all algorithms guarantee the same 1 fluence Maximization on Facebook Network data. See
1/e 0.1 approximation with probability 0.95 (see Ap- Appendix C.4.3 for additional details.
pendix C.3). However, for the low-adaptivity bench-
marks, we reduce their theoretical sample complexity
in each round to have exactly one sample per proces- Results of experiment set 1. Figures 1 and 2 plot all al-
sor (instead of their large sample complexity, e.g. 108 gorithms’ solution values and parallel runtimes for various
samples needed for A MORTIZED -F ILTERING). k on synthetic and real data (each point is the mean of 5 tri-
als with the corresponding k). In terms of solution values,
This reduction in the number of samples per round al- across all experiments, values obtained by FAST are nearly
lows the benchmarks to have each processor perform indistinguishable from values obtained by G REEDY—the
a single function evaluation per round instead of e.g. heuristic upper bound. From this comparison, it is clear that
108 /95 functions evaluations per processor per round, FAST does not compromise on the values of its solutions.
which ‘unfairly’ accelerates their runtimes at the ex- In terms of runtime, FAST is 36 to 1600 times faster than
pense of their approximations. However, we do not per- B INARY-S EARCH -M AXIMIZATION; 7 to 120 times faster
form this reduction for FAST. Instead, we require FAST than R ANDOMIZED -PARALLEL -G REEDY; 4 to 2200 times
to compute the full count of samples for its guarantees. faster than A MORTIZED -F ILTERING; and 1.1 to 7 times
This is feasible since FAST samples elements rather than faster than PARALLEL -LTLG on the 8 objectives and var-
sets. ious k (the time axes of Figures 1 and 2 are log-scaled).
Appendix C.12 shows that FAST continues to outperform
Data sets. Even with these modifications, for tractability all benchmarks even (1) when we turn off its lazy updates,
we could only use small data sets: and (2) when we run all low-adaptivity benchmarks on just
The FAST Algorithm for Submodular Maximization

Traffic Network: n=1885 Movies: n=3706 YouTube: n=17,432 Influence Max: n=26,588
1.25e+07
● ● ● ●
● ●
● ●
● 1.00e+07 ●
● 7.5e+04 9e+03
4e+09 ● ● ●
objective value

objective value

objective value

objective value
● ●
● ●
7.50e+06 ●

● 5.0e+04 ● 6e+03

● ●
● 5.00e+06 ●
2e+09

● ●
● FAST ● 2.5e+04 3e+03
2.50e+06 ●

Parallel−LTLG ●
● ●
● Random ●

0e+00 0.00e+00 0.0e+00 0e+00
250 500 750 100 200 300 400 500 250 500 750 1000 0 2500 5000 7500 10000
k k k k
1.00
50 250
200

40 200
0.75
150
time (sec)
time (sec)

time (sec)

time (sec)
30 150
0.50 100
20 100


0.25 ● 50
10 ● ● 50

● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
0 ● ● 0 ● ● 0 ● ● ●

250 500 750 100 200 300 400 500 250 500 750 1000 0 2500 5000 7500 10000
k k k k

Figure 4. Experiment Set 2.b: FAST (blue) vs. PARALLEL -LTLG (red) on real data.

a single ‘good’ guess for OPT. We emphasize that FAST’s in larger k, so larger problems exhibit even greater runtime
faster runtimes were obtained despite the fact that the three advantages for FAST.
other low-adaptivity algorithms were run with only a single
Furthermore, we emphasize that due to the fact that the
sample per processor each iteration, rather than the 108 or 1 Processor Runtime: YouTube
sample complexity of PARALLEL -LTLG is less than 95
106 samples required for their respective guarantees.
for many experiments, it cannot achieve better runtimes by 1,500

using more processors, whereas FAST can leverage up to


4.2. Experiment set 2: FAST vs. n processors to achieve additional speedups. Therefore,

time (sec)
1,000

Parallel-Lazier-than-Lazy-Greedy FAST’s fast relative runtimes are a loose lower bound for
Our second set of experiments compares FAST to the op- what can be obtained on larger-scale hardware and prob- 500
● FAST

timized parallel version of L AZIER - THAN -L AZY-G REEDY lems. Figure 5 plots FAST’s parallel speedups versus the Parallel−LTLG

● ●

number of processors we use.


(LTLG) (Mirzasoleiman et al., 2015) on large data sets. 0 ● ● ●

100
● ●

200 300 400 500

Specifically, our optimized parallel MPI implementation k


Parallel Speedup: YouTube
of LTLG allows us to scale up to random graphs with 1 Processor Runtime: YouTube
Parallel Speedup (serial time/parallel time)

20 ●

n ⇡ 100000, large real data with n up to 26000, and var- 1,500

ious k from 25 to 25000 (see Appendix C.11). For these 15

large experiments, running the parallel G REEDY algorithm


time (sec)

1,000 ●

is impractical. LTLG has a (1 1/e ") approxima- 10

tion guarantee in expectation, so we likewise set both al- 500


● FAST

gorithms’ parameters " to guarantee a (1 1/e 0.1) ap- Parallel−LTLG


5

proximation in expectation (see Appendix C.3).


● ●


0 ● ● ● ● ● ●

100 200 300 400 500 12 4 8 16 32


k processors

Results of experiment set 2. Figures 3 and 4 plot so-


Parallel Speedup: YouTube
Parallel Speedup (serial time/parallel time)

20 ●

lution values and runtimes for various k on large experi- Figure 5. Single processor runtimes for FAST and PARALLEL -
ments with synthetic and real data (each point is the mean LTLG, and parallel speedups vs. number of processors for FAST
15

of 5 trials). In terms of solution values, while the two algo- for the YouTube experiment. See Appendix C.14 for details.

rithms achieved similar solution values across all 8 exper- 10

iments, FAST obtained slightly higher solution values than Finally, we note that even on a single processor, FAST is ●

PARALLEL -LTLG on most objectives and values of k. 5

faster than LTLG for reasonable values of k on 7 of the 8 ●

objectives due to the fact that FAST often uses fewer queries

In terms of runtime, FAST was 1.5 to 32 times faster than


12 4 8 16 32

PARALLEL -LTLG on each of the 8 objectives and all k we (see Appendix C.13). For example, Figure 5 plots single
processors

tried from k = 25 to 25000. More importantly, runtime processor runtimes for the YouTube experiment.
disparities between FAST and PARALLEL -LTLG increase
The FAST Algorithm for Submodular Maximization

5. Acknowledgements Ene, A. and Nguyen, H. L. A nearly-linear time algo-


rithm for submodular maximization with a knapsack
This research was supported by a Google PhD Fellowship, constraint. ICALP, 2019b.
NSF grant CAREER CCF-1452961, BSF grant 2014389,
NSF USICCS proposal 1540428, NSF Grant 164732, a Ene, A. and Nguyen, H. L. Towards nearly-linear time al-
Google research award, and a Facebook research award. gorithms for submodular maximization with a matroid
constraint. ICALP, 2019c.
References Ene, A., Nguyen, H. L., and Vladu, A. Submodular maxi-
Badanidiyuru, A. and Vondrák, J. Fast algorithms for max- mization with matroid and packing constraints in paral-
imizing submodular functions. In Proceedings of the lel. STOC, 2019.
Twenty-Fifth Annual ACM-SIAM Symposium on Discrete
Esfandiari, H., Karbasi, A., and Mirrokni, V. Adap-
Algorithms, SODA 2014, Portland, Oregon, USA, Jan-
tivity in adaptive submodularity. arXiv preprint
uary 5-7, 2014, pp. 1497–1514, 2014. doi: 10.1137/1.
arXiv:1911.03620, 2019.
9781611973402.110. URL https://doi.org/10.
1137/1.9781611973402.110. Fahrbach, M., Mirrokni, V., and Zadimoghaddam, M.
Balkanski, E. and Singer, Y. The adaptive complexity of Submodular maximization with optimal approximation,
maximizing a submodular function. In Proceedings of adaptivity and query complexity. SODA, 2019a.
the 50th Annual ACM SIGACT Symposium on Theory of Fahrbach, M., Mirrokni, V. S., and Zadimoghaddam, M.
Computing, pp. 1138–1151. ACM, 2018a. Non-monotone submodular maximization with nearly
Balkanski, E. and Singer, Y. Approximation guarantees optimal adaptivity and query complexity. ICML, 2019b.
for adaptive sampling. In International Conference on Feldman, M., Harshaw, C., and Karbasi, A. Defining and
Machine Learning, pp. 393–402, 2018b. evaluating network communities based on ground-truth.
Balkanski, E., Breuer, A., and Singer, Y. Non-monotone Knowledge and Information Systems 42, 1 (2015), 33
submodular maximization in exponentially fewer itera- pages., 2015.
tions. NIPS, 2018.
Harper, F. M. and Konstan., J. A. The movielens datasets:
Balkanski, E., Rubinstein, A., and Singer, Y. An expo- History and context. ACM Transactions on Interactive
nential speedup in parallel running time for submodu- Intelligent Systems (TiiS) 5, 4, Article 19 (December
lar maximization without loss in approximation. SODA, 2015), 19 pages., 2015. doi: http://dx.doi.org/10.1145/
2019a. 2827872.

Balkanski, E., Rubinstein, A., and Singer, Y. An opti- Hassidim, A. and Singer, Y. Robust guarantees of stochas-
mal approximation for submodular maximization under tic greedy algorithms. In Proceedings of the 34th Inter-
a matroid constraint in the adaptive complexity model. national Conference on Machine Learning-Volume 70,
STOC, 2019b. pp. 1424–1432. JMLR. org, 2017.
CalTrans. Pems: California performance measuring sys- Kazemi, E., Mitrovic, M., Zadimoghaddam, M., Lattanzi,
tem. http://pems.dot.ca.gov/ [accessed: Au- S., and Karbasi, A. Submodular streaming in all its
gust 1, 2019]. glory: Tight approximation, minimum memory and low
adaptive complexity. arXiv preprint arXiv:1905.00948,
Chekuri, C. and Quanrud, K. Parallelizing greedy for sub-
2019.
modular set function maximization in matroids and be-
yond. STOC, 2019a. Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., Van-
Briesen, J. M., and Glance, N. S. Cost-effective out-
Chekuri, C. and Quanrud, K. Submodular function
break detection in networks. In Proceedings of the 13th
maximization in parallel via the multilinear relaxation.
ACM SIGKDD International Conference on Knowledge
SODA, 2019b.
Discovery and Data Mining, San Jose, California, USA,
Chen, L., Feldman, M., and Karbasi, A. Unconstrained August 12-15, 2007, pp. 420–429, 2007. doi: 10.1145/
submodular maximization with constant adaptive com- 1281192.1281239. URL https://doi.org/10.
plexity. STOC, 2019. 1145/1281192.1281239.

Ene, A. and Nguyen, H. L. Submodular maximization with Minoux, M. Accelerated greedy algorithms for maximizing
nearly-optimal approximation and adaptivity in nearly- submodular set functions. In Optimization techniques,
linear time. SODA, 2019a. pp. 234–243. Springer, 1978.
The FAST Algorithm for Submodular Maximization

Mirzasoleiman, B., Badanidiyuru, A., Karbasi, A.,


Vondrák, J., and Krause, A. Lazier than lazy greedy.
In Twenty-Ninth AAAI Conference on Artificial Intelli-
gence, 2015.
Mirzasoleiman, B., Badanidiyuru, A., and Karbasi, A.
Fast constrained submodular maximization: Personal-
ized data summarization. In ICML, pp. 1358–1367,
2016.
Nemhauser, G. L. and Wolsey, L. A. Best algorithms for
approximating the maximum of a submodular set func-
tion. Mathematics of operations research, 3(3):177–188,
1978.
Nemhauser, G. L., Wolsey, L. A., and Fisher, M. L. An
analysis of approximations for maximizing submodular
set functions—i. Mathematical Programming, 14(1):
265–294, 1978.

Qian, S. and Singer, Y. Fast parallel algorithms for statis-


tical subset selection problems. In Advances in Neural
Information Processing Systems, pp. 5073–5082, 2019.
Rossi, R. A. and Ahmed, N. K. The network data
repository with interactive graph analytics and visual-
ization. In Proceedings of the Twenty-Ninth AAAI Con-
ference on Artificial Intelligence, 2015. URL http:
//networkrepository.com.
Traud, A. L., Mucha, P. J., and Porter, M. A. Social
structure of Facebook networks. Phys. A, 391(16):4165–
4180, Aug 2012.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P.,
Hastie, T., Tibshirani, R., Botstein, D., and Altman, R. B.
Missing value estimation methods for dna microarrays.
Bioinformatics, 17(6):520–525, 2001.

You might also like