Reactors: A Case For Predictable, Virtualized Actor Database Systems
Reactors: A Case For Predictable, Virtualized Actor Database Systems
Systems
Vivek Shah Marcos Antonio Vaz Salles
University of Copenhagen, Denmark University of Copenhagen, Denmark
bonii@[Link] vmarcos@[Link]
ABSTRACT systems are moving towards solid state, in particular in-memory
The requirements for OLTP database systems are becoming ever storage [32], and hardware systems are integrating increasingly
more demanding. Domains such as finance and computer games more cores in a single machine. This trend brings about new re-
increasingly mandate that developers be able to encode complex quirements for database architecture, such as processing efficiency
application logic and control transaction latencies in in-memory in multi-core machines and careful design of concurrency control
databases. At the same time, infrastructure engineers in these do- strategies [52, 57]. Third, there is a need to operate databases out of
mains need to experiment with and deploy OLTP database architec- virtualized infrastructures with high resource efficiency [8, 31, 40].
tures that ensure application scalability and maximize resource uti- This trend leads to the requirement that virtualization abstractions
lization in modern machines. In this paper, we propose a relational for databases impose low overhead and allow for flexible deploy-
actor programming model for in-memory databases as a novel, ments without causing changes to application programs.
holistic approach towards fulfilling these challenging requirements. Recent research in OLTP databases has shown that address-
Conceptually, relational actors, or reactors for short, are application- ing all of these requirements is a hard problem. On the one hand,
defined, isolated logical actors that encapsulate relations and pro- shared-nothing database designs, such as those of H-Store [50] or
cess function calls asynchronously. Reactors ease reasoning about HyPer [30], fail to provide appropriately for multi-core efficiency
correctness by guaranteeing serializability of application-level func- in the presence of cross-partition transactions [20, 52]. This is due
tion calls. In contrast to classic transactional models, however, reac- to the impact of overheads in mapping partitions to cores and
tors allow developers to take advantage of intra-transaction paral- of synchronous communication in distributed transactions across
lelism and state encapsulation in their applications to reduce latency partitions. Consequently, transaction throughput and latencies in
and improve locality. Moreover, reactors enable a new degree of these systems are very sensitive to how data is partitioned. On the
flexibility in database deployment. We present ReactDB, a system other hand, shared-everything databases have a hard time achiev-
design exposing reactors that allows for flexible virtualization of ing multi-core scalability. To do so, these systems either internally
database architecture between the extremes of shared-nothing and partition their data structures, e.g., DORA [42], or benefit from affin-
shared-everything without changes to application code. Our exper- ity of memory accesses to cores in transactions, e.g., Silo [52]. Thus,
iments illustrate latency control, low overhead, and asynchronicity deployment decisions can affect efficiency and scalability in these
trade-offs with ReactDB in OLTP benchmarks. systems and are difficult to get right across application classes.
As a consequence, both developers and infrastructure engineers
ACM Reference Format:
in demanding OLTP domains have a hard time controlling the
Vivek Shah and Marcos Antonio Vaz Salles. 2018. Reactors: A Case for Pre-
performance of their transactional databases. Despite advances in
dictable, Virtualized Actor Database Systems. In SIGMOD’18: 2018 Interna-
tional Conference on Management of Data, June 10–15, 2018, Houston, TX, USA. profiling tools to identify causes of latency variance in database
ACM, New York, NY, USA, 16 pages. [Link] systems [29], today developers lack clear abstractions to reason
at a high level about the interplay of complex, potentially paral-
1 INTRODUCTION lelizable application logic and transaction latencies. In addition,
the variety of modern in-memory database engines, including nu-
Three trends are transforming the landscape of OLTP systems. First,
merous specialized designs ranging internally from shared-nothing
a host of latency-sensitive OLTP applications has emerged in areas
to shared-everything [43, 47, 56], challenges the ability of infras-
as diverse as computer games, high-performance trading, and the
tructure engineers to flexibly experiment with and adapt database
web [12, 49, 55]. This trend brings about challenging performance
architecture without affecting application code.
requirements, including mechanisms to allow developers to rea-
Actor programming models provide desirable primitives for con-
son about transaction latencies and scalability of their applications
current and distributed programming [2, 4, 27], which of late have
with large data and request volumes [48, 53]. Second, database
evoked a strong interest in the database community [9]. To holisti-
Permission to make digital or hard copies of all or part of this work for personal or cally meet the challenging requirements imposed on OLTP systems,
classroom use is granted without fee provided that copies are not made or distributed we propose a new actor programming model in relational databases
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM
called Relational Actors (or reactors for short). Reactors are special
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, types of actors that model logical computational entities encapsulat-
to post on servers or to redistribute to lists, requires prior specific permission and/or a ing state abstracted as relations. For example, reactors can represent
fee. Request permissions from permissions@[Link].
SIGMOD’18, June 10–15, 2018, Houston, TX, USA
application-level scaling units such as accounts in a banking appli-
© 2018 Association for Computing Machinery. cation or warehouses in a retail management application. Within a
ACM ISBN 978-1-4503-4703-7/18/06. . . $15.00
[Link]
reactor, developers can take advantage of classic database program- moved close to the data it touches, allowing developers to control
ming features such as declarative querying over the encapsulated for locality. At the same time, ACID properties are guaranteed for
relations. To operate on state logically distributed across reactors, auth_pay, despite asynchronous calls to a function, calc_risk,
however, developers employ explicit asynchronous function calls. that includes database updates, user-defined abort conditions, and
The latter allows developers of latency-sensitive OLTP applications potentially even nondeterminism in the calculation of sim_risk [5].
to write their programs so as to minimize cross-reactor accesses We further elaborate on this example and our model in Section 2.
or overlap communication with computation. Still, a transaction How are reactors different from database partitioning?
across multiple reactors provides serializability guarantees as in In contrast to database partitioning, which is a data-oriented op-
traditional databases, thus relieving developers from struggling timization, reactors represent a compute-oriented abstraction. In
with complex concurrency issues. Reactors allow application-level the example above, reactors can be used to model horizontal par-
modeling between the extremes of a relational database (single re- titioning of tuples, vertical partitioning of relations, or any arbi-
actor encapsulating all relations) and key-value stores (each reactor trary grouping of relation fragments. In addition, reactors allow for
encapsulating a key-value pair). functional decomposition and modeling of affinity and parallelism
To address the challenges of architectural flexibility and high re- in arbitrary application logic. In the example, detecting and fully
source efficiency in multi-core machines, we design an in-memory exploiting the parallelism of auth_pay in the stored procedure for-
database system that exposes reactors as a programming model. mulation of part (a) in a classic relational database would require
This system, called ReactDB (RElational ACTor DataBase), de- complex control-flow analysis, and may not be possible at all [45].
composes and virtualizes the notions of sharing of compute and Contributions. In short, we make the following contributions:
memory in database architecture. First, we introduce a database (1) We present a novel logical abstraction for relational databases
containerization scheme to enclose shared-memory regions in a ma- called reactors. This abstraction is grounded on transactional
chine, each storing state for one or many reactors. Second, within semantics offering serializability and an asynchronous pro-
a container, compute resources abstracted as transaction executors gramming model that allows for encoding complex application
can be deployed to either share or own reactors. The combination logic while considering relative latency of different transac-
of these two notions allows infrastructure engineers to experiment tional programs (Section 2).
with deployments capturing a range of database architecture pat- (2) We discuss the design of ReactDB, an in-memory database
terns by simply changing a configuration file. At the same time, no system exposing reactors. ReactDB enables configuration
changes are required to application code using reactors. of database architecture at deployment time in a multi-core
Example: Digital Currency Exchange. We abstract an applica- machine without changes to application code (Section 3).
tion coded using a set of reactors as a reactor database. Consider (3) In experiments with classic OLTP benchmarks, reactors pro-
a simplified digital currency exchange application, in which users vide latency control at the microsecond scale for varied pro-
may buy or sell currency through their credit card providers. Fig- gram formulations. In addition, for given program formula-
ure 1 contrasts how such an application would be written with a tions, database architecture can be configured to control the
classic transactional database and a reactor database in parts (a) trade-off between asynchronicity and load (Section 4).
and (b), respectively. The exchange wishes to limit its settlement
risk from cancelled credit card payments. To do so, it follows two 2 PROGRAMMING MODEL
application rules: (1) stop accepting orders if any single provider’s 2.1 Reactor Concepts
total unsettled exposure goes above the p_exposure threshold in
In contrast to classic transactional or actor models, reactors bring
relation settlement_risk; (2) reject orders that cause the total
together all of the following concepts:
risk-adjusted exposure across all providers to exceed the g_risk
threshold. In part (a), this logic is encoded in a stored procedure that (1) A reactor is an application-defined logical actor that encap-
computes risk-adjusted exposure through an expensive function sulates state abstracted using relations.
sim_risk and caches its result for a time period. In part (b), the (2) Declarative queries are supported only on a single reactor.
same logic is expressed with reactors. The exchange and each of Communication across reactors is achieved by asynchronous
the providers are modeled as relational actors with private state (re- function calls. A computation (function) across reactors con-
lations) that can execute certain procedures. The Exchange reactor sists of a sequence of intra-reactor statements and/or nested
can execute auth_pay and encapsulates information about provider cross-reactor function calls.
names and the settlement limits. Provider reactors store informa- (3) Computations across reactors have transactional guarantees.
tion from risk-adjusted exposure calculations per provider as well (4) Reactors provide an abstract computational cost model for
as fragments of the relation orders with the payments for each reflecting on relative latency across program formulations.
provider, and can execute procedures calc_risk and add_entry.
In auth_pay, the Exchange reactor invokes asynchronous calls to 2.2 Programming with Reactors
Provider reactors, making explicit the available intra-transaction 2.2.1 Application-Defined Relational Actors. A reactor is an ac-
parallelism. Since the exchange strives for the lowest latency possi- tor specialized for the management of state abstracted by the rela-
ble, this program formulation improves transaction response time tional model. The pseudocode in Figure 2 conceptualizes the capa-
with respect to part (a) in a way that is clearly explainable to a devel- bilities of a reactor. As a regular actor [2], a reactor encapsulates a
oper pursuing application performance optimization. In addition, it state, which can be accessed by computations invoked on the reac-
becomes explicit in transaction programs that code is conceptually tor. However, unlike in a regular actor, in which communication is
void auth_pay(pprovider, pwallet, reactor Exchange {
pvalue){ Provider Reactor ...
SELECT g_risk,p_exposure INTO risk,exposure void auth_pay(pprovider, pwallet,
FROM settlement_risk; Name : VISA_DK pvalue) {
total_risk := 0; SELECT g_risk, p_exposure INTO risk,exposure
provider_info FROM settlement_risk;
foreach e in ( RISK TIME WINDOW
SELECT [Link], [Link], [Link], [Link], results := [];
SUM(exposure) AS exposure 2341569 18-11-17 10 foreach p_provider in (
FROM provider p [Link] add_entry SELECT value
INNER JOIN orders o FROM provider_names) {
ON [Link] = [Link] orders res := calc_risk(exposure)
WHERE [Link] = ‘N’ on reactor p_provider;
GROUP BY [Link], [Link], [Link], [Link]){ WALLET VALUE SETTLED [Link](res);
if [Link] > exposure then }
abort; 43 450 N
elsif [Link] < now() - [Link] then Exchange Reactor total_risk := 0;
prisk := sim_risk([Link], [Link]); foreach res in results
Name : Exchange total_risk := total_risk + [Link]();
UPDATE provider SET risk = prisk,
time = now() WHERE name = [Link];
total_risk := total_risk + prisk; settlement_risk if total_risk + pvalue < risk then
else calc_risk add_entry(pwallet,pvalue)
P_EXPOSURE G_RISK on reactor pprovider;
total_risk := total_risk + [Link];
end if; 5000.34 234.35 else abort;
} end if;
provider_names }
if total_risk + pvalue < risk then }
INSERT INTO orders VALUES
calc_risk VALUE
(pprovider,pwallet,pvalue,‘N’);
else abort; MC_US
end if;
} VISA_DK
reactor Provider {
settlement_risk Provider Reactor ...
float calc_risk(p_exposure){
P_EXPOSURE G_RISK Name : MC_US SELECT SUM(value) INTO exposure
FROM orders WHERE settled = ‘N’;
50000000 23400000 provider_info if exposure > p_exposure then abort;
(a) (b)
Figure 1: A simplified currency exchange application in: (a) the classic transactional model, and (b) the reactor model.
To formalize the correctness of concurrent executions of transac- then either o 1 <i o 2 or o 2 <i o 1 .
tions in reactors, we show equivalence of serializable histories in
the reactor model to serializable histories in the classic transac- Formally, a transaction comprises exclusively sub-transactions,
tional model. We restrict ourselves exclusively to the notion of and the relation <i orders sub-transactions according to conflicts
conflict-serializability. Technically, our formalization is similar to in their nested basic operations. In the reactor model, two sub-
reasoning on nonlayered object transaction models [54]. transactions conflict iff the basic operations of at least one of them
contain a write and the basic operations of both of them reference
2.3.1 Background. We first review the formalism introduced by the same named item in the same reactor. Under this extended
Bernstein et al. [10, page 27] for the classic transactional model and notion of a conflict, the definition of history, serial history, equiva-
introduce relevant notation. In this model, the database consists of lence of histories and serializable history in the reactor model are
a collection of named data items, and transactions encapsulate a the same as their definitions in the classic transactional model [10],
sequence of operations. A transaction Ti is formalized as a partial but with sub-transactions replacing basic operations. Similar to
ordering of operations with an ordering relation <i and comprises nested transaction models [6], we then wish to establish an equiva-
a set of operations. Operations include reads and writes, along with lence between serializability of transactions in the reactor model
either a commit or an abort. A read from a data item x is denoted and serializability in the classic transactional model. To do so, we
r i [x], a write to x denoted w i [x], while a commit is denoted c i proceed by defining an appropriate projection of the reator model
and an abort ai . The ordering relation <i orders conflicts. Two into the classic transactional model.
operations conflict iff at least one of them is a write and both of them
Definition 2.3. The projection of a basic operation o from the
reference the same named item. We assume that a transaction does
reactor model to the classic transactional model, denoted by P (o),
not contain multiple operations of the same type to the same named
is defined as:
data item as in [10, page 27] without any impact on the results.
(1) P (r i,k j [x]) = r i [k ◦ x]
2.3.2 Reactor Model. Without loss of generality, we assume (2) P (w i,k j [x]) = w i [k ◦ x]
reactor names and sub-transaction identifiers to be drawn from the
(3) P (c i ) = c i
set of natural numbers. Recall that we denote a sub-transaction j in
(4) P (ai ) = ai
transaction Ti on reactor k by STi,k j . r i,k j [x] denotes a read from data
where ◦ denotes concatenation.
item x, and w i,k j [x] denotes a write to data item x in STi,k j . Note that
data items in different reactors are disjoint. Using this notation, we The definition provides a name mapping from the disjoint ad-
can define a sub-transaction in the reactor model as follows. dress spaces of reactors to a single address space, which is done by
concatenating the reactor identifier with name for a data item.
Definition 2.1. A sub-transaction STi,k j is a partial order with
ordering relation <i, j where, Definition 2.4. The projection of a sub-transaction STi,k j from
(1) STi,k j ⊆ { r i,k j [x], w i,k j [x], STi,k j ′ | x is a data item in k,
′ the reactor model to the classic transactional model, denoted by
i, j
j is a sub-transaction identifier s.t. j ′ , j };
′ PS (STi,k j ) is a partial order with ordering relation <S :
(2) Let (1) PS (STi,k j ) ⊆ {P (o) | o ∈ basic_ops(STi,k j )};
basic_ops(r i,k j [x]) = {r i,k j [x]} (2) if o 1 ∈ STi,k j ∧o 2 ∈ STi,k j ∧ o 1 <i, j o 2 ∧ o 1 , o 2 are reads or writes,
i, j
basic_ops(w i,k j [x]) = {w i,k j [x]} then P (o 1 ) <S P (o 2 );
i, j i, j ′
(3) if STi,k j ′ ∈ STi,k j , then <S is extended by <S ;
′
basic_ops(STi,k j ) = {basic_ops(o) | o ∈ STi,k j },
(4) if o 1 ∈ STi,k j ∧o 1 is a read or a write∧STi,k j ′ ∈ STi,k j ∧ o 1 <i, j
′
if o 1 ∈ STi,k j ∧ o 2 ∈ STi,k j i, j
STi,k j ′ , then P (o 1 ) <S PS (STi,k j ′ );
′ ′
∧ r i,k j ′ [x] ∈ basic_ops(o 1 )
′
i, j
STi,k j ′ <i, j o 2 , then PS (STi,k j ′ ) <S P (o 2 );
′ ′
then either o 1 <i, j o 2 or o 2 <i, j o 1 .
(6) if STi,k j ′ ∈ STi,k j ∧ STi,k j ′′ ∈ STi,k j ∧ STi,k j ′ <i, j STi,k j ′′ , then
′ ′′ ′ ′′
Note that the ordering relation <i, j of a sub-transaction estab-
i, j
PS (STi,k j ′ ) <S PS (STi,k j ′′ ).
′ ′′
lishes order according to conflicts in leaf-level basic operations,
Definition 2.5. The projection of a transaction Ti from the reactor
L(STi,k j ) = Pseq (STi,k j ) L(STi,k j ′ )
′
X
+
model to the classic transactional model, denoted by PT (Ti ) is a ′
STi,k j ′ ∈ syncs eq (STi,k j )
partial order with ordering relation <Ti :
X
(1) PT (Ti ) ⊆ ( ST k ∈Ti PS (STi,k j )) ∪ {P (o) | o ∈ Ti ∧
+ Cs (k, k ′ ) + Cr (k ′, k )
S
i, j
o is a commit or abort}; k ′ ∈ dest syncs eq (STi,k j )
i, j
(2) <Ti is extended by ST k ∈Ti <S ;
S
i, j
+ max *max ST k ′ ∈ async(ST k ) L(STi,k j ′ ) + Cr (k ′, k )
′
(3) if STi,k j ∈ Ti ∧ STi,k j ′ ∈ Ti ∧ STi,k j <i STi,k j ′ ∧ o 1 ∈ PS (STi,k j ) ∧
′ ′
i, j ′ i, j
Throughput [Ktxn/sec]
Avg latency [msec]
0.08 0.08 async-execution 50
Avg latency [msec]
commit+input-gen
0.06 0.06 40
0.04 30
0.04
20
0.02
0.02
10
0 full f o o full f o o full full op op
y-s ully-s pt pt-pr y-s ully-s pt pt-pr y-s y-s t t-p
red
yn yn ed yn yn ed yn yn
c c-p c c-p c c-p
0 red red red 0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 9
1 4 7
Txn size Txn size Workers
Figure 5: Latency vs. size and different Figure 6: Latency breakdown into cost Figure 7: TPC-C throughput with vary-
user program formulations. model components. ing load at scale factor 4.
fully-sync and opt program formulations. The predicted values are In summary, we observe that the latencies of transactions can
labeled fully-sync-pred and opt-pred. be reliably profiled in our system ReactDB into the components
We break down transaction latencies into the following com- of the cost model of Figure 3 even in a challenging scenario where
ponents: (a) sync-execution: the cost of processing the logic in the the cost of the processing logic in the benchmark is extremely
transaction and in synchronous sub-transactions, corresponding to small and comparable to the cost of communication across reactors.
the first two components of the cost equation in Figure 3; (b) Cs and This evidence indicates that it is possible for developers to observe
Cr : the forward and backward costs of communication between and explain the latency behavior of programs with reactors and to
reactors in the third component of the cost equation; (c) async- reformulate their programs for better performance.
execution: the cumulative execution cost of all asynchronous sub-
transactions overlapped with any synchronous sub-transactions
and processing logic, corresponding to the fourth component of the 4.3 Virtualization of Database Architecture
cost equation; (d) commit + input-gen: the cost of the commit proto- 4.3.1 Effect of Load. We evaluate the behavior of the three data-
col, including OCC and 2PC, along with the time to generate the base architecture deployments described in Section 3.3 with the
inputs for the transaction. The latter cost component is not shown standard mix of the TPC-C benchmark under increasing load from
in Figure 3 since it only applies to root transactions and not to any client workers. We fixed the database size to a scale factor of four,
sub-transaction in the reactor programming model. As such, we corresponding to four warehouse reactors. Correspondingly, we
would expect the bulk of the difference between the predicted and deployed four transaction executors in all configurations, but in-
observed performance to be explainable by this cost component. creased the number of workers invoking transactions on the data-
Figure 6 shows that the predicted breakdown for the cost compo- base from 1 to 8. As such, we expect the database to get progressively
nents closely matches the latencies profiled for actual executions in more overloaded from 5 to 8 workers.
ReactDB, even at such a fine granularity. The slight difference in Figures 7 and 8 show the average throughput and latency of
the overlap of different bars is within the variance of the observed vs. successful transactions in this scenario. We observe that shared-
the calibration measurement, and expected especially since calibra- everything-with-affinity outperforms the other deployment con-
tion measures parameters within the 5µsec range. For a transaction figurations, since it exploits memory access affinities and mini-
size of one, we can see that opt has the same performance behavior mizes cross-core communication overheads. Even though shared-
as fully-sync. This effect arises because the destination transaction nothing-async attempts to overlap cross-reactor sub-transactions as
executor for the credit in the transfer is the same as the source trans- much as possible, the relative cost of dispatching sub-transactions
action executor for the debit, resulting in a synchronous execution with limited processing logic makes it more expensive than shared-
of the credit and debit sub-transactions similar to fully-sync. As we everything-with-affinity, which employs direct memory accesses.
increase the transaction size, the number of transaction executors We observe that the differences in throughput between the two
spanned by the transaction increases, and the execution costs of methods are not large between 1 and 4 workers, since there is only
fully-sync grow because of increasing costs in sync-execution, Cs a 1% chance of items for stock update in the new-order transaction
and Cr . We again observe here cost asymmetry between Cs and and a 15% chance of customer lookups in the payment transaction
Cr , arising for the same reasons remarked in Section 4.2.1. For opt, being remote. Since workers employ client affinity to warehouses,
we do not observe any sync-execution costs, since all credit sub- note that for 5 workers, there are two worker threads generating
transactions are overlapped with each other and with the single workload for the first warehouse; for eight workers, there are two
debit on the source reactor. The growth in the async-execution cost worker threads generating workload for every warehouse. So from
of opt with increasing transaction size is caused by the rising com- 5 to 8 workers, the possibility of conflicts between concurrent trans-
munication cost for the sequence of credit sub-transactions, i.e., the actions to the same warehouses arises, especially for the payment
last asynchronous credit sub-transaction incurs a cumulative cost and new-order transactions.
of communication of all asynchronous sub-transactions before it in As expected, therefore, from 1 to 4 workers, abort rates for all de-
the sequence. ployments are negligible and close to nil. Abort rates then go up to
0.4 4000 5
shared-everything-without-affinity shared-nothing-async shared-nothing-async
shared-nothing-async 3500 shared-everything-with-affinity 4.5 shared-everything-with-affinity
0.35
shared-everything-with-affinity
4
3000
Throughput [txn/sec]
0.3
Avg latency [msec]
0.05 0 0.5
0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
Workers Workers Workers
Figure 8: TPC-C latency with varying Figure 9: Throughput of new-order- Figure 10: Latency of new-order-delay
load at scale factor 4. delay transactions with varying load. transactions with varying load.
4.1% for shared-everything-without-affinity and 5.72% for shared- Moreover, we observe that asynchronicity in transactions engen-
nothing-async for eight workers. Interestingly, shared-everything- ders a trade-off between communication overheads and processing
with-affinity is resilient to this effect, because each transaction costs. We validate this observation by fitting our cost model to the
executor runs a transaction to completion before picking up the TPC-C new-order transaction in Appendix C, and explore asyn-
next one from its queue. In other words, since shared-everything- chronicity trade-offs further in the next section.
with-affinity does not employ asynchronicity, transaction execu- 4.3.2 Asynchronicity Tradeoffs. To additionally drill down on
tor threads never block, and executors can operate with a multi- the potential benefits of asynchronicity under concurrency, we eval-
programming level of one. At the same time, affinity in routing uate in this section the two database architectures shared-nothing-
ensures that each transaction executor services transactions for a async and shared-everything-with-affinity under varying load. We
different warehouse reactor, preserving identical execution behav- control the amount of load by varying the number of workers from
ior to that observed for workers 1 to 4 despite changes in the con- 1 to 8, while keeping the number of warehouses constant at a scale
flict patterns in workload generation. Even though shared-nothing- factor of eight. For clarity, we focus exclusively on new-order trans-
async also ensures affinity in transaction routing, it uses cooperative actions. Each new-order consists of between 5-15 items and we
multi-threading and a higher multi-programming level for greater force each of the items to be drawn from a remote warehouse with
hardware utilization, and ends up executing multiple transactions equal probability. Since the default new-order formulation has lim-
concurrently from a given transaction executor’s queue. ited parallelism in the logic executed at remote warehouses, we
While we increase the number of client workers, we must re- augmented the logic for stock data update with an artificial delay
member that the database system processing resources in the form between 300 and 400 µsec by generating random numbers to model
of transaction executors remain fixed. So the throughput on average stock replenishment calculations. This increases the overall work
increases for all the deployments because the hardware utilization in the transaction without increasing its data footprint and the
goes up. The hardware utilization on each of the transaction ex- contention on the database.
ecutor cores for shared-everything-with-affinity increases from 0 Figures 9 and 10 show the throughput and latency, respectively,
to 83% between 1 and 4 workers, and is pushed further to 99% as of running 100% new-order-delay transactions under increasing
we get to eight workers. Given the use of asynchronicity in shared- load. With one worker, the throughput of shared-nothing-async
nothing-async, this deployment uses all cores from the start, with a is double that of shared-everything-with-affinity. The former ex-
hardware utilization for one worker on the four transaction execu- ecutes all the stock updates across 5-6 remote warehouse asyn-
tor cores of 76%, 2.5%, 2.5% and 2.5%, respectively. The hardware chronously (average distinct remote warehouses chosen from 7
utilization of transaction executor cores with this deployment is using a uniform distribution) fully utilizing the available hardware
uniform at 79% with four workers, and keeps on rising up to 98% for parallelism, while the latter executes the entire transaction logic
eight workers. For shared-everything-without-affinity, the hard- sequentially. Although shared-nothing-async incurs higher com-
ware utilization is uniform throughout, rising from 17% to 66% from munication cost in dispatching the stock updates to be performed
1 to 4 workers and reaching only 84% with eight workers. by different warehouse reactors, the greater amount of work in
We observe that shared-everything-without-affinity exhibits the each stock update makes it worthwhile in comparison to sequen-
worst performance. At one worker, every transaction invocation tial shared memory accesses in shared-everything-with-affinity.
is routed by the round-robin load balancing router to a different Conversely, as we increase the number of workers and thus pres-
transaction executor than the one that processed the last request, sure on resources, the throughput of shared-nothing-async starts
amplifying cross-core communication. As workers are added this growing less than that of shared-everything-with-affinity. Note
effect diminishes; however, the lower hardware utilization and the that the abort rate for the deployments was negligible (0.03-0.07%),
eventual rise in abort rates limits the efficiency of this architecture. highlighting the limited amount of contention on actual items.
In short, we remark that the capability to virtualize database ar- In summary, these results suggest that the most effective data-
chitecture allows ReactDB to be configured to maximize hardware base architecture may change depending on load conditions when
utilization and minimize conflicts in a standard OLTP benchmark. asynchronicity can be exploited by transaction code. Under high
load, shared-everything-with-affinity exhibits the best performance
among the architectures evaluated, since it reduces overhead at the database system, i.e., enriching the data tier itself. Additionally,
expense of not utilizing at all intra-transaction parallelism. On the ReactDB comprises building a high-performance, scalable, multi-
other hand, when load conditions are light to normal and when core OLTP system with an actor-oriented programming model and
transaction logic comprises enough parallelism, shared-nothing- latency control, which is not the target design and feature set of
async can achieve substantially higher throughput and lower la- the vision for Orleans [9].
tency. To further validate these observations, we evaluate in Ap- As explained in Section 2.2, reactors are related to the early work
pendix D the effects of varying cross-reactor accesses in the TPC-C on Argus [36] because of the asynchronous transactional program-
benchmark under conditions of high load. ming model supporting nested function calls; however, the reactor
programming model is substantially different from that of Argus.
5 RELATED WORK First, the use of a relational data and query model is a central idea
In-memory OLTP Databases. H-Store [50] and HyPer [30] fol- of reactors, but not of Argus. Note that the latter is not a simple
low an extreme shared-nothing design by having single-threaded restriction of the former, because the programming issues handled
execution engines responsible for each data partition. As a result, by a relational abstraction, e.g., physical data independence, would
single-partition transactions are extremely fast, but multi-partition need to be coded from scratch at a very low level in Argus. Second,
transactions and skew greatly affect system throughput. LADS [56] user-defined logical actors are a central idea of reactors, but not of
improves upon this limitation by merging transaction logic and Argus. Guardians in Argus employ multiple concurrent processes
eliminating multi-partition synchronization through dynamic anal- explicitly, while reactors abstract away the mapping to threads in
ysis of batches of specific transaction classes. In contrast to these ReactDB. Third, reasoning about latency from the programming
shared-nothing engines, shared-everything lock-based OLTP sys- model is a central idea of reactors, but again not of Argus. Even
tems specifically designed for multi-cores, such as DORA [42] and though Argus has low-level asynchronous calls, it lacks an explicit
PLP [43], advocate partitioning of internal engine data structures cost model of synchronous and asynchronous communication. On
for scalability. Orthrus [47] partitions only the lock manager and the system implementation level, ReactDB is an OLTP database
utilizes a message-passing design for lock acquisition to reduce system designed for low-overhead virtualization of database archi-
lock contention across multiple cores for contended workloads. tecture, which was never the focus of Argus. These differences to
In contrast to the baked-in architectural approach of earlier Argus also distinguish our work from a large class of object-oriented
engines, ReactDB borrows the highly-scalable OCC implementa- distributed computing and operating systems [11, 14, 18, 33, 41].
tion of Silo [52], building on top of it a virtualization layer that Database Virtualization. Virtualization of database engines for
allows for flexible architectural deployments, e.g., as a classic shared- cloud computing has focused on particular target database archi-
everything engine, a shared-nothing engine, or an affinity-based tectures, e.g., shared-nothing databases with transactional support
shared-everything engine. In addition, ReactDB is not restricted to only within partitions [8] or distributed control architectures with
specific transaction classes, supporting transactions with, e.g., user- weaker consistency guarantees [31]. By contrast, ReactDB offers
defined aborts, conditionals, and range queries. Finally, ReactDB infrastructure engineers the possibility to configure database archi-
is the first engine realizing the programming model of reactors. tecture itself by containerization, while maintaining a high degree
Transactional Partitioned Data Stores. A class of systems pro- of transaction isolation. Our results support recent observations
vides transactional support over key-value stores as long as keys are of low overhead of use of container mechanisms together with an
co-located in the same machine or key group [16, 17]. Warp [25], in in-memory database [39], while showing that even more flexibility
contrast, provides full transaction support with nested transactions, in database architecture can be achieved at negligible cost.
but limits query capabilities, e.g., no predicate reads are provided The design of ReactDB is reminiscent of work on OLTP on hard-
nor relational query support. The limited transactional support and ware islands [44] and on the cost of synchronization primitives [19].
low-level storage-based programming model make it difficult to Complementary to recent work on compiler optimizations [46], re-
express applications as opposed to the reactor programming model, actors can be seen as a step towards improving programmability
which provides serializable transactions with relational query ca- and support for stored procedures in database systems.
pabilities. Recent work has also focused on enhancing concurrency
through static analysis of transaction programs [38, 58]. The latter 6 CONCLUSION
could be assimilated in the implementation of ReactDB’s concur- In this paper, we introduced reactors, a new relational abstraction
rency control layers as future work. for in-memory OLTP databases. Reactors comprise an asynchro-
Asynchronous Programming. As mentioned previously, reac- nous programming model that allows encoding of intra-transaction
tors are a novel restructuring in the context of databases of the parallelism in database stored procedures while maintaining seri-
actor model [2]. In contrast to regular actors, reactors comprise alizability. We presented the design of ReactDB, the first imple-
an explicit memory model with transactions and relational query- mentation of reactors. ReactDB enables flexible and controllable
ing, substantially simplifying program logic. These features make database architecture configuration at deployment time. Reactors
the reactor model differ significantly from the virtual actors of Or- open up a variety of directions for future work, ranging from reac-
leans [7] and from other actor-based frameworks [4, 27]. Recent tor database modeling to efficient mapping of reactors to distributed
work in Orleans has focused on a vision of integrating traditional hardware architectures.
data-management functionality in a virtual actor runtime for the Acknowledgments. We thank the anonymous reviewers for their
middle tier of a classic three-tier architecture [9, 23]. This approach insightful comments and Phil Bernstein for discussions and sugges-
is complementary to our work of integrating actor features in a tions, which have helped us improve this paper.
REFERENCES [31] Donald Kossmann, Tim Kraska, and Simon Loesing. 2010. An evaluation of
[1] Alok Aggarwal and Jeffrey Scott Vitter. 1988. The Input/Output Complexity of alternative architectures for transaction processing in the cloud. In Proc. ACM
Sorting and Related Problems. Commun. ACM 31, 9 (1988), 1116–1127. SIGMOD. 579–590.
[2] Gul Agha. 1986. Actors: A Model of Concurrent Computation in Distributed Systems. [32] Per-Åke Larson (Ed.). 2013. IEEE Data Engineering Bulletin: Special Issue on
MIT Press, Cambridge, MA, USA. Main-Memory Database Systems. Vol. 36, number 2.
[3] Mohammad Alomari, Michael J. Cahill, Alan Fekete, and Uwe Röhm. 2008. The [33] Edward D. Lazowska, Henry M. Levy, Guy T. Almes, Michael J. Fischer, Robert J.
Cost of Serializability on Platforms That Use Snapshot Isolation. In Proc. ICDE. Fowler, and Stephen C. Vestal. 1981. The Architecture of the Eden System. In
576–585. Proc. SOSP. 148–159.
[4] Joe Armstrong. 2010. Erlang. Commun. ACM 53, 9 (2010), 68–75. [34] Christophe Lécluse and Philippe Richard. 1989. The O2 Database Programming
[5] Subi Arumugam, Ravi Jampani, Luis Leopoldo Perez, Fei Xu, Christopher M. Language. In Proc. VLDB. 411–422.
Jermaine, and Peter J. Haas. 2010. MCDB-R: Risk Analysis in the Database. [35] Daan Leijen, Wolfram Schulte, and Sebastian Burckhardt. 2009. The design of a
PVLDB 3, 1 (2010), 782–793. task parallel library. In Proc. OOPSLA. 227–242.
[6] Catriel Beeri, Philip A. Bernstein, and Nathan Goodman. 1989. A Model for [36] Barbara Liskov, Dorothy Curtis, Paul Johnson, and Robert Scheifler. 1987. Imple-
Concurrency in Nested Transactions Systems. J. ACM 36, 2 (1989), 230–269. mentation of Argus. In Proc. SOSP. 111–122.
[7] Philip Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, and Jorgen Thelin. [37] B. Liskov and L. Shrira. 1988. Promises: Linguistic Support for Efficient Asyn-
2014. Orleans: Distributed Virtual Actors for Programmability and Scalability. chronous Procedure Calls in Distributed Systems. In Proc. PLDI. 260–267.
Technical Report MSR-TR-2014-41. Microsoft Research. [38] Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, and Jinyang Li. 2014. Extracting
[8] Philip A. Bernstein, Istvan Cseri, Nishant Dani, Nigel Ellis, Ajay Kalhan, Gopal More Concurrency from Distributed Transactions. In Proc. OSDI. 479–494.
Kakivaya, David B. Lomet, Ramesh Manne, Lev Novik, and Tomas Talius. 2011. [39] Tobias Mühlbauer, Wolf Rödiger, Andreas Kipf, Alfons Kemper, and Thomas
Adapting microsoft SQL server for cloud computing. In Proc. ICDE. 1255–1263. Neumann. 2015. High-Performance Main-Memory Database Systems and Modern
[9] Philip A. Bernstein, Mohammad Dashti, Tim Kiefer, and David Maier. 2017. Virtualization: Friends or Foes?. In Proc. DanaC. 4:1–4:4.
Indexing in an Actor-Oriented Database. In Proc. CIDR. [40] Vivek R. Narasayya, Sudipto Das, Manoj Syamala, Badrish Chandramouli, and
[10] Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1987. Concurrency Surajit Chaudhuri. 2013. SQLVM: Performance Isolation in Multi-Tenant Rela-
Control and Recovery in Database Systems. Addison-Wesley. tional Database-as-a-Service. In Proc. CIDR.
[11] Kenneth P. Birman. 1985. Replication and Fault-Tolerance in the ISIS System. In [41] Howard Oakley. 1989. Mercury: an operating system for medium-grained par-
Proc. SOSP. 79–86. allelism. Microprocessors and Microsystems - Embedded Hardware Design 13, 2
[12] Andrew Brook. 2015. Low-latency distributed applications in finance. Commun. (1989), 97–102.
ACM 58, 7 (2015), 42–50. [42] Ippokratis Pandis, Ryan Johnson, Nikos Hardavellas, and Anastasia Ailamaki.
[13] Michael J. Carey, David J. DeWitt, and Scott L. Vandenberg. 1988. A Data Model 2010. Data-oriented Transaction Execution. PVLDB 3, 1-2 (2010), 928–939.
and Query Language for EXODUS. In Proc. ACM SIGMOD. 413–423. [43] Ippokratis Pandis, Pinar Tözün, Ryan Johnson, and Anastasia Ailamaki. 2011.
[14] Panos K. Chrysanthis, Krithi Ramamritham, David W. Stemple, and Stephen PLP: Page Latch-free Shared-everything OLTP. PVLDB 4, 10 (2011), 610–621.
Vinter. 1986. The Gutenberg Operating System Kernel. In Proc. FJCC. 1159–1167. [44] Danica Porobic, Ippokratis Pandis, Miguel Branco, Pinar Tözün, and Anastasia
[15] Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Ailamaki. 2012. OLTP on Hardware Islands. PVLDB 5, 11 (2012), 1447–1458.
Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proc. ACM SoCC. [45] PostgreSQL Documentation 2017. Parallel Safety in PostgreSQL 10 (Section 15.4).
143–154. (October 2017). [Link]
[16] Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. 2010. G-Store: a scalable [46] Karthik Ramachandra, Kwanghyun Park, K. Venkatesh Emani, Alan Halverson,
data store for transactional multi key access in the cloud. In Proc. ACM SoCC. César A. Galindo-Legaria, and Conor Cunningham. 2017. Optimization of Imper-
163–174. ative Programs in a Relational Database. PVLDB 11, 4 (2017), 432–444.
[17] Sudipto Das, Amr El Abbadi, and Divyakant Agrawal. 2009. ElasTraS: An Elas- [47] Kun Ren, Jose M. Faleiro, and Daniel J. Abadi. 2016. Design Principles for Scaling
tic Transactional Data Store in the Cloud. In Workshop on Hot Topics in Cloud Multi-core OLTP Under High Contention. In Proc. ACM SIGMOD. 1583–1598.
Computing, HotCloud. [48] R. Shoup and D. Pritchett. 2006. The eBay architecture. In SD Forum.
[18] Partha Dasgupta, Richard J. LeBlanc, and William F. Appelbe. 1988. The Clouds [49] Michael Stonebraker. 2012. New opportunities for New SQL. Commun. ACM 55,
Distributed Operating System. In Proc. ICDCS. 2–9. 11 (2012), 10–11.
[19] Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2013. Everything You [50] Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos,
Always Wanted to Know About Synchronization but Were Afraid to Ask. In Proc. Nabil Hachem, and Pat Helland. 2007. The End of an Architectural Era: (It’s Time
SOSP. 33–48. for a Complete Rewrite). In Proc. VLDB. 1150–1160.
[20] Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, [51] TPCC 2017. The TPC-C Benchmark. (May 2017). [Link]
Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL Server’s [52] Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden.
Memory-optimized OLTP Engine. In Proc. ACM SIGMOD. 1243–1254. 2013. Speedy Transactions in Multicore In-memory Databases. In Proc. SOSP.
[21] Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudré- 18–32.
Mauroux. 2013. OLTP-Bench: An Extensible Testbed for Benchmarking Relational [53] Pattishall. D. V. 2007. Federation at Flickr (doing billions of queries per day). In
Databases. PVLDB 7, 4 (2013), 277–288. MySQL Conference.
[22] Aleksandar Dragojević, Dushyanth Narayanan, Edmund B. Nightingale, Matthew [54] Gerhard Weikum and Gottfried Vossen. 2002. Transactional Information Systems:
Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No Compro- Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Morgan
mises: Distributed Transactions with Consistency, Availability, and Performance. Kaufmann.
In Proc. SOSP. 54–70. [55] Walker M. White, Christoph Koch, Nitin Gupta, Johannes Gehrke, and Alan J.
[23] Tamer Eldeeb and Phil Bernstein. 2016. Transactions for Distributed Demers. 2007. Database research opportunities in computer games. SIGMOD
Actors in the Cloud. Technical Report MSR-TR-2016-1001. Microsoft Record 36, 3 (2007), 7–13.
Research. [Link] [56] Chang Yao, Divyakant Agrawal, Gang Chen, Qian Lin, Beng Chin Ooi, Weng-Fai
transactions-distributed-actors-cloud-2/ Wong, and Meihui Zhang. 2016. Exploiting Single-Threaded Model in Multi-Core
[24] E. N. Elnozahy, Lorenzo Alvisi, Yi-Min Wang, and David B. Johnson. 2002. A In-Memory Systems. IEEE Trans. Knowl. Data Eng. 28, 10 (2016), 2635–2650.
survey of rollback-recovery protocols in message-passing systems. ACM Comput. [57] Xiangyao Yu, George Bezerra, Andrew Pavlo, Srinivas Devadas, and Michael
Surv. 34, 3 (2002), 375–408. Stonebraker. 2014. Staring into the Abyss: An Evaluation of Concurrency Control
[25] Robert Escriva, Bernard Wong, and Emin Gün Sirer. 2015. Warp: Lightweight with One Thousand Cores. PVLDB 8, 3 (2014), 209–220.
Multi-Key Transactions for Key-Value Stores. CoRR abs/1509.07815 (2015). [58] Yang Zhang, Russell Power, Siyuan Zhou, Yair Sovran, Marcos K. Aguilera, and
[26] H-Store 2017. H-Store Supported Benchmarks (Smallbank). (May 2017). http: Jinyang Li. 2013. Transaction chains: achieving serializability with low latency
//[Link]/documentation/deployment/benchmarks/ in geo-distributed storage systems. In Proc. SOSP. 276–291.
[27] Yaroslav Hayduk, Anita Sobe, and Pascal Felber. 2015. Dynamic Message Pro- [59] Wenting Zheng, Stephen Tu, Eddie Kohler, and Barbara Liskov. 2014. Fast
cessing and Transactional Memory in the Actor Model. In Proc. DAIS. 94–107. Databases with Fast Durability and Recovery Through Multicore Parallelism. In
[28] Joseph Hellerstein, Michael Stonebraker, and James Hamilton. 2007. Architecture Proc. OSDI. 465–477.
of a Database System. Foundations and Trends in Databases 1, 2 (2007), 141–259.
[29] Jiamin Huang, Barzan Mozafari, Grant Schoenebeck, and Thomas F. Wenisch. A PROOF OF THEOREM 2.7
2017. A Top-Down Approach to Achieving Performance Predictability in Data-
base Systems. In Proc. ACM SIGMOD. 745–758.
Let us assume H is serializable and H ′ is not serializable. From the
[30] Alfons Kemper and Thomas Neumann. 2011. HyPer: A Hybrid OLTP&OLAP serializability theorem, since H is serializable, the serializability
Main Memory Database System Based on Virtual Memory Snapshots. In Proc. graph of H (SG (H )) is acyclic; since the projected history H ′ is
ICDE. 195–206.
not serializable, the serializability graph SG (H ′ ) must be cyclic.
Therefore, there must exist a cycle Ti′ → . . . → T j′ → . . . → Ti′ . Cross 1 worker 4 workers
Reactor Latency Latency
Since the graph is built on operations of the classic transactional TPS TPS
Access msec msec
model, then there must be conflicting operations oi′ < P H . . . < P H
% Obs Pred Pred+C+I Obs Obs Obs
o j′ < P H . . . <P H oi′ . By condition (3) of Definition 2.6, there must 1 6921 0.131 0.148 0.144 27091 0.148
k ∈ T and ST
exist sub-transactions STi,l k′ k <
∈ T j such that STi,l
i j,l ′ H 100 5246 0.159 0.189 0.191 14485 0.277
k < . . . < ST k . As a result, SG (H ) must be cyclic,
′
. . . <H ST j,l Table 1: TPC-C new-order performance at scale factor 4.
′ H H i,l
and we arrive at a contradiction. To show the reverse direction, it
keys for multi_update, for different number of workers along
is simple to follow a similar argument, but starting with a cyclic
with the cost model predictions. For one worker, the observed
graph in SG(H) and showing that SG(H’) must be cyclic as well in
average latency decreases as skew increases, given that more of the
contradiction.
sub-transactions invoked in multi_update become synchronous.
B EFFECT OF SKEW AND QUEUING ON This effect arises since the communication overhead to dispatch
LATENCY a sub-transaction to a remote reactor is greater than the time to
In this appendix, we drill down experimentally into the limitations process a single update. This trend is also captured in our cost
discussed for our cost model in Section 2.4, evaluating a scenario model prediction, 1 worker pred, which shows decreasing average
with queuing delays and skew. latency as the zipfian constant increases to 0.99. Interestingly, the
Setup. We created a separate setup to evaluate limitations in ex- curve predicts that the average latency should increase after 0.99,
planatory ability of our cost model by augmenting the YCSB bench- which conflicts with our observation. This is because the cost of
mark [15] with a multi_update transaction that updates 10 keys. processing actually decreases when all updates are to the same
We model each key as a reactor, and the multi_update trans- key in the transaction. In this case, the read in a read-modify-write
action invokes a read-modify-write update sub-transaction asyn- can just return the previous updated value from the transaction’s
chronously for each key (record size of 100 bytes). Unlike in the write-set instead of accessing the index structure, which was not
setup with Smallbank, we configure ReactDB such that transac- accounted for while calibrating processing cost. In addition to 1
tions may not realize full parallelism at the system level. In particu- worker pred, we also show a curve adding to the prediction the
lar, we chose a scale factor of 4 where each scale factor corresponds measured costs of input generation and commitment. We remind
to 10,000 keys. Four containers are deployed, each with one transac- the reader that the latter two costs are not part of the equation in
tion executor, and assigned 10,000 contiguous reactors correspond- Figure 3, and as in Section 4.2.2, the difference between predicted
ing to the scale factor. Furthermore, we select the reactor where and observed latencies for a single worker are largely accounted
the multi_update transaction is invoked randomly from the set of by these two factors.
10 keys in the update, and choose the keys for multi_update from With four workers, queueing and skew together lead to both
a zipfian distribution. As such, we model program logic where it higher and more variable latencies overall, which are as expected
is not possible to determine statically how many sub-transactions not captured by the cost model. Abort rates also increase, going
will be realized by the system synchronously or asynchronously. from 0.26% to 3.24% as the zipfian constant increases from 0.01 to
However, to ensure that transactions remain fork-join, we sorted 0.99. Hardware utilization on the transaction executor core hosting
the keys to be updated in a multi_update transaction such that the most accessed reactors reflects the trend in the figure, corre-
keys deployed in remote transaction executors precede the ones in sponding to 63%, 92% and 100% at zipfian constants 0.01, 0.5 and
the local transaction executor where the transaction is initiated. Fi- 0.99. At 5.0 skew, a single reactor is accessed, eliminating variability
nally, the experiment is performed both with one and four workers but not queueing.
to simulate absence and presence of queuing delays, respectively.
Results. We employ the multi_update transaction under a 100% C COST MODEL VALIDATION WITH TPC-C
mix. Recall that our setup explained above explicitly restricts the NEW-ORDER
parallelism that can be realized by the system. So to be able to We observed in Section 4.3.1 that shared-nothing-async under-
fit our cost model, we recorded the average sizes of the realized performs compared to shared-everything-with-affinity even with
syncovp (STi,k j ) and async(STi,k j ) sequences with the zipfian distri- a single client worker, despite having more transaction executors
bution employed. Similar to the experiment in Section 4.2.2, we available for transaction processing. To better understand this effect,
calibrated the average communication and processing cost param- we turned to our cost model for validation of a mix consisting of
eters by profiling the multi_update transaction with updates to 100% new-order transactions with four warehouses and four trans-
a single key chosen from a uniform distribution. We emphasize action executors. Similarly to what was done in Section 4.2.2 and
that the cost model abstracts intra-transaction parallelism to aid Appendix B, we calibrated cost model parameters with a new-order
developers in contrasting transaction program formulations, and transaction requesting only one item from a local and one from a
thus does not capture interference among concurrent workers or remote warehouse. Moreover, we recorded the average numbers
queueing effects. As such, we expect cost model predictions to be of synchronous and asynchronous stock-update requests realized
most accurate in configurations where a single worker is deployed, with a single worker, as well as the average numbers of items in-
as in Section 4.2. volved. To explore the impact of communication overheads, we
Figure 11 shows the observed behavior of average latency with evaluated two extremes with respectively 1% and 100% probability
varying skew, as captured by the zipfian constant used to select of cross-reactor accesses in stock updates (see also Appendix D).
0.2 1 180
1 worker obs shared-everything-without-affinity shared-everything-without-affinity
4 workers obs 0.9 shared-nothing-async 160 shared-nothing-async
1 worker pred shared-everything-with-affinity shared-everything-with-affinity
0.8 140
0.15 1 worker pred + input-gen + commit shared-nothing-sync
Throughput [Ktxn/sec]
Avg latency [msec]
Figure 11: Effect of skew and queuing on Figure 12: Latency of cross-reactor TPC- Figure 13: TPC-C throughput with vary-
latency. C new-order (scale factor 8). ing deployments.
Table 1 summarizes the results obtained with one and four work- same latency at 0% cross-reactor transactions as shared-everything-
ers. We obtain excellent fit between the cost model prediction after with-affinity. However, there is a sharp drop in the performance of
including commit and input generation cost (Pred+C+I) and the ob- shared-nothing deployments from 0% to 10% cross-reactor trans-
served latency for one worker under both 1% and 100% cross-reactor actions. This effect is in line with our previous observation that
accesses. We observed that the cost of communication between re- sub-transaction invocations require expensive migration of con-
actors, especially Cr as remarked in Section 4.2.1, is high compared trol in contrast to both shared-everything-without-affinity and
with the processing cost of stock updates. However, the relatively shared-everything-with-affinity. Note that the abort rate for all
small growth in the latency for 100% cross-reactor access with one deployments remained negligible (0.02%-0.04%), highlighting the
worker compared with 1% cross-reactor accesses shows the bene- limited amount of contention on actual items.
fits of overlapping multiple asynchronous sub-transactions across We observe that shared-nothing-async exhibits higher resilience
reactors. With four workers, queueing effects manifest with 100% to increase in cross-reactor transactions when compared with shared-
cross-reactor accesses, which are as discussed in Appendix B not nothing-sync. In particular, latency of shared-nothing-async is bet-
part of the cost model, though prediction is still accurate for 1% ter by roughly a factor of two at 100% cross-reactor transactions.
cross-reactor accesses. This is because shared-nothing-async employs new-order transac-
The hardware utilization with one worker on the four transaction tions with asynchronous sub-transaction invocations on remote
executor cores under 1% cross-reactor accesses is 78%, 2.3%, 2.3%, warehouse reactors, and tries to overlap remote sub-transaction
and 2.3%, respectively, while the values under 100% cross-reactor invocation with execution of logic locally on a warehouse reactor.
accesses are 65%, 24%, 24%, and 24%. The use of four workers in- This demonstrates how application programs can leverage the pro-
creases utilization, which becomes uniform at 82% and 86% under gramming model to engineer application code using reactors with
1% and 100% cross-reactor accesses, respectively. different performance characteristics. At the same time, infrastruc-
ture engineers can select the database architecture that best fits the
D EFFECT OF CROSS-REACTOR execution conditions for the workload without changes to appli-
TRANSACTIONS cation code. In the case of peak load and limited intra-transaction
In this appendix, we evaluate the impact of cross-reactor transac- parallelism, shared-everything-with-affinity turned out to be the
tions on the performance of the different database architectures, best architecture among the ones considered for this scenario, in
complementing the results of Section 4.3. For clarity, we focus line with the results of [52].
exclusively on new-order transactions without any additional com-
putations. To vary the percentage of cross-reactor new-order trans- E SCALE-UP AND OVERHEAD IN REACTDB
actions, we vary the probability that a single item in the transaction This appendix further complements Section 4.3 by presenting trans-
is drawn from a remote warehouse (the remote warehouses are actional scale-up in ReactDB to the degree allowed by our hard-
again chosen with an equal probability). Each new-order consists ware setup and discussing containerization overhead.
of between 5-15 items. Transactional Scale-Up. In this section, we evaluate the scala-
Figure 12 shows the average latencies of running the TPC-C bility of ReactDB across multiple cores for the three database
benchmark with 100% new-order transactions at a scale factor of architecture deployments described in Section 3.3. Figure 13 shows
eight, i.e., eight warehouses receive transactions from eight workers the average transaction throughput of running the TPC-C trans-
at peak load. Due to space constraints, we omit measured through- action mix as we increase the number of warehouses (reactors).
puts. Since ReactDB uses Silo’s OCC protocol and owing to the low While we have also measured latencies, we omit the results due to
contention in TPC-C even upon increasing the number of remote space constraints. Note that we configure the number of transaction
items, we would expect the latency of shared-everything-with- executors to be equal to the scale factor for the experiment.
affinity to be agnostic to changes in the proportion of cross-reactor We observe that the shared-everything-without-affinity deploy-
transaction as per the results in [52]. However, we see a small ment exhibits the worst throughput scalability among the deploy-
gradual increase in the latency for all the deployments. We believe ments selected. This effect is a consequence of shared-everything-
these effects are a consequence of cache coherence and cross-core without-affinity’s poor ability to exploit memory access affinities
communication overheads. We observe further that both shared- within each transaction executor, given round-robin routing of
nothing-sync and shared-nothing-async configurations exhibit the transactions. On the other hand, shared-everything-with-affinity
100
query-parallelism leveraging procedure-level parallelism in the reactor model. In Fig-
procedure-parallelism
sequential ure 1(a), we see the formulation of auth_pay in the classic transac-
80
Avg latency [msec] tional model. If the orders relation were partitioned, the join query
60 between provider and orders would be eligible for parallelization
by a query optimizer in a traditional relational database. Note that
40
the optimizer would consider query-level parallelization, and not
20 a holistic procedure-level parallelization achievable by manual de-
composition of the join in the reactor programming model as shown
0
101 102 103 104 105 106
Random numbers per provider, log scale
in Figure 1(b). In the latter, the potentially expensive computation
Figure 14: Latency of query- vs. procedure-level parallelism. of sim_risk would also be parallelized across reactors.
For this experiment, we configured ReactDB with up to 16 trans-
and shared-nothing-async both take advantage of access affinities
action executors. To focus on intra-transaction parallelism, a single
and behave similarly. We see that shared-everything-with-affinity
worker generates auth_pay transaction invocations targeted at one
is slightly superior to shared-nothing-async. The difference lies in
Exchange and 15 Provider reactors, used to express three program
the relative costs in these deployments of sub-transaction invoca-
execution strategies in ReactDB. Both the sequential and query-
tions vs. direct memory accesses of data for remote warehouses. For
parallelism strategies encode the program of Figure 1(a) for the clas-
a scale factor of one, there are no cross-reactor transactions, and the
sic transactional model. In sequential, we deploy a single container
performance of the two deployments is identical. From a scale fac-
and transaction executor for all reactors in ReactDB, and thus the
tor of two onwards, the probabilities of cross-reactor transactions
whole procedure body executes in a single hardware thread. In
range between 0% to 10%, as discussed in Section 4.3.1. In shared-
query-parallelism, by contrast, we horizontally decompose orders
nothing-async, a sub-transaction call is routed to its corresponding
across the Provider reactors by deploying one container and trans-
transaction executor, incurring context switching and communica-
action executor for each reactor. Similar to a partitioned database,
tion overheads. By contrast, since shared-everything-with-affinity
this results in a parallel foreign key join between providers and
executes the sub-transaction in the same transaction executor, the
orders fragments, but sequential execution of the remainder of the
remote call costs are traded off for the relatively smaller costs of
procedure. Finally, the procedure-parallelism strategy encodes the
cache pressure. We also ran the experiment with all the transaction
program of Figure 1(b). Using the same deployment as for query-
classes in the TPC-C mix invoking sub-transactions synchronously
parallelism, this achieves holistic procedure parallelization in the
in the shared-nothing deployment (shared-nothing-sync configura-
reactor model. The inputs for auth_pay were chosen using uni-
tion described in Section 3.3). However, the throughput and latency
form distributions, and the orders relation was loaded with 30,000
of this configuration was close (within the variance bars) to the
records per provider, mirroring the cardinality of the orders re-
shared-nothing-async configuration because of the low percentage
lation in the TPC-C benchmark. To simulate another transaction
of remote warehouse calls in the default TPC-C mix. We hence omit
settling orders and keep the number of records scanned in the
the curve from Figure 13 for brevity.
orders relation fixed, a pre-configured window of records is re-
In short, the results indicate that ReactDB can be flexibly con-
verse range scanned ordered by time per provider. This window
figured with different database architectures to achieve adequate
parameter has a direct effect on the benefit of query-parallelism
transactional scalability in our available hardware setup for the
compared with sequential. For clarity, we tuned this value to 800
standard TPC-C workload. Further, high affinity of data accesses to
records per provider to ensure that query-parallelism would outper-
physical processing elements (cores) is crucial to performance.
form sequential by 4x when sim_risk is not invoked. We simulated
Containerization Overheads. To account for the overhead of
the computational load of sim_risk by random number generation
containerization, we also ran ReactDB while submitting empty
similar to Section 4.3.2. We also loaded our data values such that
transactions with concurrency control disabled. We observe roughly
sim_risk is always invoked and transactions are not aborted due
constant overhead per transaction invocation across scale factors
to application logic failures.
of around 22 µsec. We measured that thread switching overhead
Figure 14 shows average latencies as we vary the computational
between the worker and transaction executor across different cores
load in sim_risk by generating progressively more random num-
is a largely dominant factor and is dependent on the machine used.
bers. We observe that procedure-parallelism is more resilient to
The overhead accounted for 18% of average TPC-C transaction
changes in computational load and better utilizes the available paral-
latency. When compared with executing the TPC-C application
lel hardware in this benchmark. At the extreme of 106 random num-
code directly within the database kernel without worker threads
bers generated per provider, the latency of procedure-parallelism
generating transaction invocations that are separate from database
is a factor of 8.14x and 8.57x lower than that of query-parallelism
processing threads, as in Silo, the overhead is significant, but if a
and sequential, respectively. The transaction executor core hosting
database engine with kernel thread separation is assumed, as is the
the Exchange reactor becomes overloaded at 100% utilization un-
normal case, the overhead is negligible.
der query-parallelism, while with procedure-parallelism utilization
remains uniform at 84% across executor cores hosting Provider
F PROCEDURE-LEVEL PARALLELISM WITH reactors and 14% in the Exchange executor core. In sequential, the
REACTORS transaction executor mapped to all reactors becomes overloaded at
In this section, we revisit the example introduced in Figure 1 and 100% even when the computational load is zero.
evaluate the potential performance gains that can be obtained by