0% found this document useful (0 votes)
68 views16 pages

Reactors: A Case For Predictable, Virtualized Actor Database Systems

SS18-reactdb

Uploaded by

mohamin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views16 pages

Reactors: A Case For Predictable, Virtualized Actor Database Systems

SS18-reactdb

Uploaded by

mohamin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Reactors: A Case for Predictable, Virtualized Actor Database

Systems
Vivek Shah Marcos Antonio Vaz Salles
University of Copenhagen, Denmark University of Copenhagen, Denmark
bonii@[Link] vmarcos@[Link]
ABSTRACT systems are moving towards solid state, in particular in-memory
The requirements for OLTP database systems are becoming ever storage [32], and hardware systems are integrating increasingly
more demanding. Domains such as finance and computer games more cores in a single machine. This trend brings about new re-
increasingly mandate that developers be able to encode complex quirements for database architecture, such as processing efficiency
application logic and control transaction latencies in in-memory in multi-core machines and careful design of concurrency control
databases. At the same time, infrastructure engineers in these do- strategies [52, 57]. Third, there is a need to operate databases out of
mains need to experiment with and deploy OLTP database architec- virtualized infrastructures with high resource efficiency [8, 31, 40].
tures that ensure application scalability and maximize resource uti- This trend leads to the requirement that virtualization abstractions
lization in modern machines. In this paper, we propose a relational for databases impose low overhead and allow for flexible deploy-
actor programming model for in-memory databases as a novel, ments without causing changes to application programs.
holistic approach towards fulfilling these challenging requirements. Recent research in OLTP databases has shown that address-
Conceptually, relational actors, or reactors for short, are application- ing all of these requirements is a hard problem. On the one hand,
defined, isolated logical actors that encapsulate relations and pro- shared-nothing database designs, such as those of H-Store [50] or
cess function calls asynchronously. Reactors ease reasoning about HyPer [30], fail to provide appropriately for multi-core efficiency
correctness by guaranteeing serializability of application-level func- in the presence of cross-partition transactions [20, 52]. This is due
tion calls. In contrast to classic transactional models, however, reac- to the impact of overheads in mapping partitions to cores and
tors allow developers to take advantage of intra-transaction paral- of synchronous communication in distributed transactions across
lelism and state encapsulation in their applications to reduce latency partitions. Consequently, transaction throughput and latencies in
and improve locality. Moreover, reactors enable a new degree of these systems are very sensitive to how data is partitioned. On the
flexibility in database deployment. We present ReactDB, a system other hand, shared-everything databases have a hard time achiev-
design exposing reactors that allows for flexible virtualization of ing multi-core scalability. To do so, these systems either internally
database architecture between the extremes of shared-nothing and partition their data structures, e.g., DORA [42], or benefit from affin-
shared-everything without changes to application code. Our exper- ity of memory accesses to cores in transactions, e.g., Silo [52]. Thus,
iments illustrate latency control, low overhead, and asynchronicity deployment decisions can affect efficiency and scalability in these
trade-offs with ReactDB in OLTP benchmarks. systems and are difficult to get right across application classes.
As a consequence, both developers and infrastructure engineers
ACM Reference Format:
in demanding OLTP domains have a hard time controlling the
Vivek Shah and Marcos Antonio Vaz Salles. 2018. Reactors: A Case for Pre-
performance of their transactional databases. Despite advances in
dictable, Virtualized Actor Database Systems. In SIGMOD’18: 2018 Interna-
tional Conference on Management of Data, June 10–15, 2018, Houston, TX, USA. profiling tools to identify causes of latency variance in database
ACM, New York, NY, USA, 16 pages. [Link] systems [29], today developers lack clear abstractions to reason
at a high level about the interplay of complex, potentially paral-
1 INTRODUCTION lelizable application logic and transaction latencies. In addition,
the variety of modern in-memory database engines, including nu-
Three trends are transforming the landscape of OLTP systems. First,
merous specialized designs ranging internally from shared-nothing
a host of latency-sensitive OLTP applications has emerged in areas
to shared-everything [43, 47, 56], challenges the ability of infras-
as diverse as computer games, high-performance trading, and the
tructure engineers to flexibly experiment with and adapt database
web [12, 49, 55]. This trend brings about challenging performance
architecture without affecting application code.
requirements, including mechanisms to allow developers to rea-
Actor programming models provide desirable primitives for con-
son about transaction latencies and scalability of their applications
current and distributed programming [2, 4, 27], which of late have
with large data and request volumes [48, 53]. Second, database
evoked a strong interest in the database community [9]. To holisti-
Permission to make digital or hard copies of all or part of this work for personal or cally meet the challenging requirements imposed on OLTP systems,
classroom use is granted without fee provided that copies are not made or distributed we propose a new actor programming model in relational databases
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM
called Relational Actors (or reactors for short). Reactors are special
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, types of actors that model logical computational entities encapsulat-
to post on servers or to redistribute to lists, requires prior specific permission and/or a ing state abstracted as relations. For example, reactors can represent
fee. Request permissions from permissions@[Link].
SIGMOD’18, June 10–15, 2018, Houston, TX, USA
application-level scaling units such as accounts in a banking appli-
© 2018 Association for Computing Machinery. cation or warehouses in a retail management application. Within a
ACM ISBN 978-1-4503-4703-7/18/06. . . $15.00
[Link]
reactor, developers can take advantage of classic database program- moved close to the data it touches, allowing developers to control
ming features such as declarative querying over the encapsulated for locality. At the same time, ACID properties are guaranteed for
relations. To operate on state logically distributed across reactors, auth_pay, despite asynchronous calls to a function, calc_risk,
however, developers employ explicit asynchronous function calls. that includes database updates, user-defined abort conditions, and
The latter allows developers of latency-sensitive OLTP applications potentially even nondeterminism in the calculation of sim_risk [5].
to write their programs so as to minimize cross-reactor accesses We further elaborate on this example and our model in Section 2.
or overlap communication with computation. Still, a transaction How are reactors different from database partitioning?
across multiple reactors provides serializability guarantees as in In contrast to database partitioning, which is a data-oriented op-
traditional databases, thus relieving developers from struggling timization, reactors represent a compute-oriented abstraction. In
with complex concurrency issues. Reactors allow application-level the example above, reactors can be used to model horizontal par-
modeling between the extremes of a relational database (single re- titioning of tuples, vertical partitioning of relations, or any arbi-
actor encapsulating all relations) and key-value stores (each reactor trary grouping of relation fragments. In addition, reactors allow for
encapsulating a key-value pair). functional decomposition and modeling of affinity and parallelism
To address the challenges of architectural flexibility and high re- in arbitrary application logic. In the example, detecting and fully
source efficiency in multi-core machines, we design an in-memory exploiting the parallelism of auth_pay in the stored procedure for-
database system that exposes reactors as a programming model. mulation of part (a) in a classic relational database would require
This system, called ReactDB (RElational ACTor DataBase), de- complex control-flow analysis, and may not be possible at all [45].
composes and virtualizes the notions of sharing of compute and Contributions. In short, we make the following contributions:
memory in database architecture. First, we introduce a database (1) We present a novel logical abstraction for relational databases
containerization scheme to enclose shared-memory regions in a ma- called reactors. This abstraction is grounded on transactional
chine, each storing state for one or many reactors. Second, within semantics offering serializability and an asynchronous pro-
a container, compute resources abstracted as transaction executors gramming model that allows for encoding complex application
can be deployed to either share or own reactors. The combination logic while considering relative latency of different transac-
of these two notions allows infrastructure engineers to experiment tional programs (Section 2).
with deployments capturing a range of database architecture pat- (2) We discuss the design of ReactDB, an in-memory database
terns by simply changing a configuration file. At the same time, no system exposing reactors. ReactDB enables configuration
changes are required to application code using reactors. of database architecture at deployment time in a multi-core
Example: Digital Currency Exchange. We abstract an applica- machine without changes to application code (Section 3).
tion coded using a set of reactors as a reactor database. Consider (3) In experiments with classic OLTP benchmarks, reactors pro-
a simplified digital currency exchange application, in which users vide latency control at the microsecond scale for varied pro-
may buy or sell currency through their credit card providers. Fig- gram formulations. In addition, for given program formula-
ure 1 contrasts how such an application would be written with a tions, database architecture can be configured to control the
classic transactional database and a reactor database in parts (a) trade-off between asynchronicity and load (Section 4).
and (b), respectively. The exchange wishes to limit its settlement
risk from cancelled credit card payments. To do so, it follows two 2 PROGRAMMING MODEL
application rules: (1) stop accepting orders if any single provider’s 2.1 Reactor Concepts
total unsettled exposure goes above the p_exposure threshold in
In contrast to classic transactional or actor models, reactors bring
relation settlement_risk; (2) reject orders that cause the total
together all of the following concepts:
risk-adjusted exposure across all providers to exceed the g_risk
threshold. In part (a), this logic is encoded in a stored procedure that (1) A reactor is an application-defined logical actor that encap-
computes risk-adjusted exposure through an expensive function sulates state abstracted using relations.
sim_risk and caches its result for a time period. In part (b), the (2) Declarative queries are supported only on a single reactor.
same logic is expressed with reactors. The exchange and each of Communication across reactors is achieved by asynchronous
the providers are modeled as relational actors with private state (re- function calls. A computation (function) across reactors con-
lations) that can execute certain procedures. The Exchange reactor sists of a sequence of intra-reactor statements and/or nested
can execute auth_pay and encapsulates information about provider cross-reactor function calls.
names and the settlement limits. Provider reactors store informa- (3) Computations across reactors have transactional guarantees.
tion from risk-adjusted exposure calculations per provider as well (4) Reactors provide an abstract computational cost model for
as fragments of the relation orders with the payments for each reflecting on relative latency across program formulations.
provider, and can execute procedures calc_risk and add_entry.
In auth_pay, the Exchange reactor invokes asynchronous calls to 2.2 Programming with Reactors
Provider reactors, making explicit the available intra-transaction 2.2.1 Application-Defined Relational Actors. A reactor is an ac-
parallelism. Since the exchange strives for the lowest latency possi- tor specialized for the management of state abstracted by the rela-
ble, this program formulation improves transaction response time tional model. The pseudocode in Figure 2 conceptualizes the capa-
with respect to part (a) in a way that is clearly explainable to a devel- bilities of a reactor. As a regular actor [2], a reactor encapsulates a
oper pursuing application performance optimization. In addition, it state, which can be accessed by computations invoked on the reac-
becomes explicit in transaction programs that code is conceptually tor. However, unlike in a regular actor, in which communication is
void auth_pay(pprovider, pwallet, reactor Exchange {
pvalue){ Provider Reactor ...
SELECT g_risk,p_exposure INTO risk,exposure void auth_pay(pprovider, pwallet,
FROM settlement_risk; Name : VISA_DK pvalue) {
total_risk := 0; SELECT g_risk, p_exposure INTO risk,exposure
provider_info FROM settlement_risk;
foreach e in ( RISK TIME WINDOW
SELECT [Link], [Link], [Link], [Link], results := [];
SUM(exposure) AS exposure 2341569 18-11-17 10 foreach p_provider in (
FROM provider p [Link] add_entry SELECT value
INNER JOIN orders o FROM provider_names) {
ON [Link] = [Link] orders res := calc_risk(exposure)
WHERE [Link] = ‘N’ on reactor p_provider;
GROUP BY [Link], [Link], [Link], [Link]){ WALLET VALUE SETTLED [Link](res);
if [Link] > exposure then }
abort; 43 450 N
elsif [Link] < now() - [Link] then Exchange Reactor total_risk := 0;
prisk := sim_risk([Link], [Link]); foreach res in results
Name : Exchange total_risk := total_risk + [Link]();
UPDATE provider SET risk = prisk,
time = now() WHERE name = [Link];
total_risk := total_risk + prisk; settlement_risk if total_risk + pvalue < risk then
else calc_risk add_entry(pwallet,pvalue)
P_EXPOSURE G_RISK on reactor pprovider;
total_risk := total_risk + [Link];
end if; 5000.34 234.35 else abort;
} end if;
provider_names }
if total_risk + pvalue < risk then }
INSERT INTO orders VALUES
calc_risk VALUE
(pprovider,pwallet,pvalue,‘N’);
else abort; MC_US
end if;
} VISA_DK
reactor Provider {
settlement_risk Provider Reactor ...
float calc_risk(p_exposure){
P_EXPOSURE G_RISK Name : MC_US SELECT SUM(value) INTO exposure
FROM orders WHERE settled = ‘N’;
50000000 23400000 provider_info if exposure > p_exposure then abort;

provider RISK TIME WINDOW


SELECT risk, time, window INTO p_risk,
add_entry
5909863 18-11-17 30 p_time, p_window
NAME RISK TIME WINDOW [Link] FROM provider_info;
VISA_DK 2341569 18-11-17 10 if p_time < now() - p_window then
[Link] orders p_risk := sim_risk(my_name(),exposure);
UPDATE provider_info SET risk = p_risk,
MC_US 5909863 18-11-17 30 WALLET VALUE SETTLED auth_pay time = now();
[Link] end if;
42 1000 Y
orders return p_risk;
85 356.23 N }
PROVIDER WALLET VALUE SETTLED
void add_entry(wallet, value){
VISA_DK 43 450 N INSERT INTO orders VALUES
(wallet, value, ‘N’);
MC_US 42 1000 Y }
}
MC_US 85 356.23 N

(a) (b)
Figure 1: A simplified currency exchange application in: (a) the classic transactional model, and (b) the reactor model.

Reactor : Actor { entities, i.e., a reactor is a combination of a logical thread of control


RelationalState rs ;
and an encapsulated relational state accessible exclusively by that
F u t u r e e x e c u t e ( compute_fn , a r g s ) { logical thread. While objects encapsulate complex types, reactors
return new F u t u r e ( c o m p u t e _ f n ( a r g s , r s ) ) ;
} encapsulate whole relational schemas; declarative querying hap-
} pens only within, not across reactors, and communication across
Figure 2: Conceptual view of reactors as an actor type. reactors is explicit through asynchronous function calls.
In the example of Figure 1(b), the state of a Provider reac-
typically achieved by non-blocking send and blocking receive prim- tor results from both horizontal and vertical fragmentation of
itives, the only form of communication with a reactor is through the original providers relation from part (a) as well as horizon-
asynchronous function calls returning promises [37]. Moreover, the tal fragmentation of the orders relation (with omission of the
code of such functions is similar to that of database stored proce- provider column). The state of the Exchange reactor retains a
dures, which can intermix declarative queries over relations with relation provider_names, since the latter is necessary for access-
other program logic and function calls. These asynchronous func- ing Provider reactors, and the settlement_risk relation. This
tion calls are abstracted in Figure 2 by the execute function, which illustrates that different reactors may contain either the same or
takes as an argument a function to be computed on the reactor’s different schemas, according to their types.
state along with appropriate arguments, and returns a promise It is not necessary to know in advance all the providers and their
representing the result of the computation. In the remainder of names to model the reactor database. It is sufficient to know: (1) the
this paper, we refer to such a result as a future, and use the terms types of the reactors expected, namely Exchange and Provider;
function and procedure on a reactor interchangeably. (2) the schemas and functions of each reactor type; (3) the name map-
A reactor database is a collection of reactors, each of which must ping to address provider reactors. As such, adding new providers
abide by a reactor type. Reactor types, defined by application devel- does not necessitate rewriting the application logic.
opers, specify the functions that should be invoked in a reactor and
determine the relation schemas encapsulated in the reactor state. 2.2.2 Asynchronous Function Calls. To invoke a procedure on
To instantiate a reactor database, we need to declare the names a reactor, we must explicitly use the declared name of the reactor
and types of the reactors constituting the database and a schema where the computation must be executed. The procedure logic
creation function for each reactor type. The developer cannot create can access the relational state on the reactor where it is invoked
or destroy reactors; these purely logical entities are accessible by through declarative queries. If the procedure needs to access the
their declared names for the lifetime of the application, bound by state of another reactor, then it must invoke another procedure on
the failure model of the reactor database. In contrast to objects in an the target reactor. This is necessary because the states of different
object-oriented database [13, 34], reactors are active computational reactors are disjoint. Since the result of a procedure is represented
by a future, the calling code can choose to wait for the result of For example in Figure 1(b), the auth_pay procedure does not
the future, invoke procedures on other reactors, or execute further wait on the result of the add_entry procedure call, since the pro-
application logic. This flexibility allows application developers to gramming model guarantees that the transaction corresponding to
expose parallelism within the procedure. auth_pay only completes when all its sub-transactions complete.
In Figure 1(b), the syntax procedure_name(args) on reactor In addition, auth_pay aborts if any of the asynchronous calls to
reactor_name specifies an asynchronous procedure call routed to calc_risk raises an abort due to excessive provider exposure.
the reactor with a given name. In the logic of auth_pay, the execu- Any program in the classic transactional model can be trivially
tion of calc_risk is overlapped on each of the Provider reactors remodeled in the reactor programming model by specifying a single
and the futures returned by the calls are stored in the results list. reactor. For example, we could model a single exchange reactor with
The Exchange reactor sums up the total risk by accessing the value the schema and application logic shown in Figure 1(a). However,
of each future by invoking get() on the future object. If the total the benefits of our programming model are only fully achieved
risk is within the allowed risk, then the exchange reactor performs when developers of latency-sensitive OLTP applications remodel
another nested asynchronous procedure call to add_entry on the their logic as done in Figure 1(b). In particular, in the reformulated
provider reactor with name given as a parameter to auth_pay. This logic, intra-transaction parallelism is exposed. Furthermore, the
call results in adding an order at the target provider reactor. trade-off between scalability on the number of provider reactors
Does the reactor programming model necessitate manual and latency of executing the logic of auth_pay becomes explicit.
optimization? We posit that reactors can act as a bridging model Is there a development methodology to architect an applica-
between a classic database abstraction and a key-value API. Reac- tion using reactors? An interesting avenue for future research is
tors provide the possibility for developers to navigate the extremes to explore an analytical machinery for modeling and comparing the
between a single-reactor database with full SQL support and a radi- quality of reactor database designs, similar to the entity relation-
cal decomposition of individual tuples as reactors. We envision that ship model and decomposition of universal relations by functional
the typical application modeling will be a hybrid between these two dependencies in classic relational databases. Although answering
extremes balancing reuse of query optimization machinery with this question is beyond the scope of this paper, we envision that
low-level performance control. The popularity of NoSQL databases developers could start from a single reactor type with the whole re-
points to the need and willingness among application developers lational schema and all application functions. Through an iterative
to obtain higher performance control and scalability for their appli- process of performance analysis and decomposition of application
cations even at the cost of sacrificing traditional database features functionality into multiple reactor types, developers can improve
such as query optimization and transaction support. latency of their applications through cross-reactor communication,
and also identify inherent scalability limitations by analyzing the
2.2.3 Reactor Consistency using Transactional Semantics. To degree of locality in application logic.
guarantee consistency of the state encapsulated by a reactor data-
base, the semantics of procedure invocations on reactors is transac- 2.2.4 Intra-Transaction Safety. Introducing asynchrony in a trans-
tional. We differentiate between top-level and nested asynchronous actional abstraction is not trivial. Since asynchronicity exposes
procedure calls. Top-level calls are executed by clients on a reactor intra-transaction parallelism, race conditions could arise when sub-
and are termed interchangeably transactions or root transactions. transactions that conflict on a data item are invoked asynchronously
Transactions respect the classic ACID properties [10]. We denote a on the same reactor. Moreover, such invocations would violate the
concrete execution i of a transaction by Ti . illusion that a reactor is a computational entity with a single logical
Nested asynchronous procedure calls are executed by a reactor thread of control. To avoid these issues, we must enforce that at
on another reactor. Since these calls must always occur within the most one execution context is active for a given reactor and root
overall context of a root transaction, they are called sub-transactions. transaction at any time.
We denote a concrete execution of a sub-transaction j of transaction First, we enforce that whenever a reactor running a procedure
Ti on a reactor k by STi,k j . Sub-transactions allow programmers to directly executes a nested procedure invocation on itself, the nested
structure their computations for performance, allowing for concur- invocation is executed synchronously. This policy corresponds to
rent computation on (logically) distributed state among reactors. inlining the sub-transaction call, resulting in future results being im-
Sub-transactions are not used, however, to allow for partial commit- mediately available. To deal with nested asynchronous invocations,
ment. Any condition leading to an abort in a sub-transaction leads we define the active set of a reactor k as the set of sub-transactions,
to the abort of the corresponding root transaction. This approach regardless of corresponding root transaction, that are currently
towards the semantics of nested calls is exactly the reverse of what being executed on reactor k, i.e., have been invoked, but have not
is adopted in classic systems such as Argus [36], reflecting our focus completed. Thus, the runtime system must conservatively disallow
on leveraging the high degree of physical parallelism in modern execution of a sub-transaction STi,k j when:
commodity hardware for transactional processing as opposed to
∃STi,k j ′ ∈ active_set(k ) ∧ j ′ , j
managing faults in settings with a high degree of physical distribu-
tion (e.g., geo-distribution) as in previous work. A transaction or This dynamic safety condition prohibits programs with cyclic ex-
sub-transaction completes only when all its nested sub-transactions ecution structures across reactors and programs in which different
complete. This frees the client logic from explicitly synchronizing paths of asynchronous calls lead to concurrent sub-transactions on
on the result of a sub-transaction invocation if it does not need the the same reactor. By introducing this safety condition, the runtime
result of the sub-transaction. conservatively assumes that conflicts may arise in asynchronous
accesses to the same reactor state within a transaction, and thus potentially nested in sub-transactions. These leaf-level basic opera-
aborts any transaction with such dangerous structures. tions are determined by the function basic_ops in the definition.
By leveraging this dynamic safety condition, we envision that
Definition 2.2. A transaction Ti is a partial order with ordering
appropriate testing of transaction logic at development time will be
relation <i where,
sufficient to root out most, if not all, dangerous structures from the (1) Ti ⊆ { STi,k j } ∪ {ai , c i };
code of latency-sensitive OLTP applications. However, formalizing
(2) ai ∈ Ti ⇐⇒ c i < Ti ;
static program checks to aid in detection of dangerous call structures
(3) if t is c i or ai (whichever is in Ti ), then for every other
among reactors is an interesting direction for future work.
operation p ∈ Ti , p <i t;
(4) if o 1 ∈ Ti ∧ o 2 ∈ Ti ∧ o 1 , o 2 < {ai , c i }∧
r i,k j ′ [x] ∈ basic_ops(o 1 ) ∧ w i,k j ′′ [x] ∈ basic_ops(o 2 ),
2.3 Conflict-Serializability of Transactions ′ ′

To formalize the correctness of concurrent executions of transac- then either o 1 <i o 2 or o 2 <i o 1 .
tions in reactors, we show equivalence of serializable histories in
the reactor model to serializable histories in the classic transac- Formally, a transaction comprises exclusively sub-transactions,
tional model. We restrict ourselves exclusively to the notion of and the relation <i orders sub-transactions according to conflicts
conflict-serializability. Technically, our formalization is similar to in their nested basic operations. In the reactor model, two sub-
reasoning on nonlayered object transaction models [54]. transactions conflict iff the basic operations of at least one of them
contain a write and the basic operations of both of them reference
2.3.1 Background. We first review the formalism introduced by the same named item in the same reactor. Under this extended
Bernstein et al. [10, page 27] for the classic transactional model and notion of a conflict, the definition of history, serial history, equiva-
introduce relevant notation. In this model, the database consists of lence of histories and serializable history in the reactor model are
a collection of named data items, and transactions encapsulate a the same as their definitions in the classic transactional model [10],
sequence of operations. A transaction Ti is formalized as a partial but with sub-transactions replacing basic operations. Similar to
ordering of operations with an ordering relation <i and comprises nested transaction models [6], we then wish to establish an equiva-
a set of operations. Operations include reads and writes, along with lence between serializability of transactions in the reactor model
either a commit or an abort. A read from a data item x is denoted and serializability in the classic transactional model. To do so, we
r i [x], a write to x denoted w i [x], while a commit is denoted c i proceed by defining an appropriate projection of the reator model
and an abort ai . The ordering relation <i orders conflicts. Two into the classic transactional model.
operations conflict iff at least one of them is a write and both of them
Definition 2.3. The projection of a basic operation o from the
reference the same named item. We assume that a transaction does
reactor model to the classic transactional model, denoted by P (o),
not contain multiple operations of the same type to the same named
is defined as:
data item as in [10, page 27] without any impact on the results.
(1) P (r i,k j [x]) = r i [k ◦ x]
2.3.2 Reactor Model. Without loss of generality, we assume (2) P (w i,k j [x]) = w i [k ◦ x]
reactor names and sub-transaction identifiers to be drawn from the
(3) P (c i ) = c i
set of natural numbers. Recall that we denote a sub-transaction j in
(4) P (ai ) = ai
transaction Ti on reactor k by STi,k j . r i,k j [x] denotes a read from data
where ◦ denotes concatenation.
item x, and w i,k j [x] denotes a write to data item x in STi,k j . Note that
data items in different reactors are disjoint. Using this notation, we The definition provides a name mapping from the disjoint ad-
can define a sub-transaction in the reactor model as follows. dress spaces of reactors to a single address space, which is done by
concatenating the reactor identifier with name for a data item.
Definition 2.1. A sub-transaction STi,k j is a partial order with
ordering relation <i, j where, Definition 2.4. The projection of a sub-transaction STi,k j from
(1) STi,k j ⊆ { r i,k j [x], w i,k j [x], STi,k j ′ | x is a data item in k,
′ the reactor model to the classic transactional model, denoted by
i, j
j is a sub-transaction identifier s.t. j ′ , j };
′ PS (STi,k j ) is a partial order with ordering relation <S :
(2) Let (1) PS (STi,k j ) ⊆ {P (o) | o ∈ basic_ops(STi,k j )};
basic_ops(r i,k j [x]) = {r i,k j [x]} (2) if o 1 ∈ STi,k j ∧o 2 ∈ STi,k j ∧ o 1 <i, j o 2 ∧ o 1 , o 2 are reads or writes,
i, j
basic_ops(w i,k j [x]) = {w i,k j [x]} then P (o 1 ) <S P (o 2 );
i, j i, j ′
(3) if STi,k j ′ ∈ STi,k j , then <S is extended by <S ;

basic_ops(STi,k j ) = {basic_ops(o) | o ∈ STi,k j },
(4) if o 1 ∈ STi,k j ∧o 1 is a read or a write∧STi,k j ′ ∈ STi,k j ∧ o 1 <i, j

if o 1 ∈ STi,k j ∧ o 2 ∈ STi,k j i, j
STi,k j ′ , then P (o 1 ) <S PS (STi,k j ′ );
′ ′
∧ r i,k j ′ [x] ∈ basic_ops(o 1 )

(5) if STi,k j ′ ∈ STi,k j ∧ o 2 ∈ STi,k j ∧ o 2 is a read or a write ∧



∧ w i,k j ′′ [x] ∈ basic_ops(o 2 ),

i, j
STi,k j ′ <i, j o 2 , then PS (STi,k j ′ ) <S P (o 2 );
′ ′
then either o 1 <i, j o 2 or o 2 <i, j o 1 .
(6) if STi,k j ′ ∈ STi,k j ∧ STi,k j ′′ ∈ STi,k j ∧ STi,k j ′ <i, j STi,k j ′′ , then
′ ′′ ′ ′′
Note that the ordering relation <i, j of a sub-transaction estab-
i, j
PS (STi,k j ′ ) <S PS (STi,k j ′′ ).
′ ′′
lishes order according to conflicts in leaf-level basic operations,
Definition 2.5. The projection of a transaction Ti from the reactor
L(STi,k j ) = Pseq (STi,k j ) L(STi,k j ′ )

X
+
model to the classic transactional model, denoted by PT (Ti ) is a ′
STi,k j ′ ∈ syncs eq (STi,k j )
partial order with ordering relation <Ti :
X
(1) PT (Ti ) ⊆ ( ST k ∈Ti PS (STi,k j )) ∪ {P (o) | o ∈ Ti ∧
 
+ Cs (k, k ′ ) + Cr (k ′, k )
S
i, j  
o is a commit or abort}; k ′ ∈ dest syncs eq (STi,k j )
i, j
(2) <Ti is extended by ST k ∈Ti <S ;
S
i, j
+ max *max ST k ′ ∈ async(ST k ) L(STi,k j ′ ) + Cr (k ′, k )

(3) if STi,k j ∈ Ti ∧ STi,k j ′ ∈ Ti ∧ STi,k j <i STi,k j ′ ∧ o 1 ∈ PS (STi,k j ) ∧
′ ′
i, j ′ i, j

o 2 ∈ PS (STi,k j ′ ), then o 1 <Ti o 2 ;


′ , !
X
(4) if t is c i or ai (whichever is in PT (Ti )), for any further + Cs (k, k ′′ ) ,
operation p ∈ PT (Ti ), p <Ti t.
 ′

k ′′ ∈ dest prefix(async(STi,k j ) , STi,k j ′ )

Povp (STi,k j ) L(STi,k j ′ )



X
The definitions unroll all sub-transactions in the reactor model + +
into read and write operations in the classic transactional model ′
STi,k j ′ ∈ syncov p (STi,k j )
while maintaining ordering constraints.
X  
Definition 2.6. The projection P (H ) of a history H over a set of Cs (k, k ′ ) + Cr (k ′, k ) +
transactions T = {T1 ,T2 , ...,Tn } from the reactor model to the classic
 
k ′ ∈ dest syncov p (STi,k j )
-
transactional model is a partial order with ordering relation < P H
over a set of transactions T ′ = {PT (T1 ), PT (T2 ), ..., PT (Tn )} iff: Figure 3: Modeling latency cost of a fork-join sub-
(1) P P H (H ) = ni=1 PT (Ti );
S transaction in the reactor model.
(2) < P H is extended by ni=1 <Ti ;
S
cost to receive a result from k ′ at k. The sequence of children
(3) if o 1 ∈ PS (STi,k j ) ∧ o 2 ∈ PS (STik′, j ′ ) ∧ STi,k j ∈ Ti ∧ STik′, j ′ ∈
′ ′

sub-transactions of STi,k j executed asynchronously are denoted


Ti ′ ∧ STi,k j <H STik′, j ′ , then o 1 < P H o 2 as long as o 1 and o 2

async(STi,k j ). The synchronous children sub-transactions and pro-


conflict.
cessing logic overlapped with the asynchronous sub-transactions
Theorem 2.7. A history H is serializable in the reactor model iff its are represented by syncovp (STi,k j ) and Povp (STi,k j ), respectively.
projection H ′ = P (H ) in the classic transactional model is serializable. Given a sequence S of sub-transactions, prefix(S, STi,k j ) denotes
Proof. See Appendix A. □ the sequence of sub-transactions in S up to STi,k j and including it.
Theorem 2.7 implies that, with appropriate care, a scheduler for Moreover, we say that dest(ST1 , . . . , STn ) represents the sequence
the classic transactional model can be used to implement one for of reactors that sub-transactions ST1 , . . . , STn execute on.
the reactor model. In Section 3, we reuse an optimistic concurrency Now, the latency cost of STi,k j is modeled by the formula in Fig-
control (OCC) scheduler [52] and two-phase commit (2PC) [10]. ure 3, assuming the parallelism in asynchronous sub-transactions
is fully realized. The same formula can be applied recursively to
2.4 Computational Cost Model compute the latency cost for sub-transactions of arbitrary depth.
In this section, we introduce a cost model to support developers of Since a root transaction is a special case of a sub-transaction, i.e., a
latency-sensitive applications in controlling the latency of a trans- sub-transaction without a parent, the same formula applies, modulo
action program expressed using reactors. Clearly, latency depends any overheads incurred for commitment.
heavily on program structure. For example, certain programs can Uses. Many applications can be written in the reactor model with
overlap asynchronous invocations of functions in other reactors fork-join sub-transactions. Consider the transaction auth_pay in
with processing logic; other programs may do so only condition- Figure 1(b). While it is not fork-join as presented, it can be easily de-
ally, or have data dependencies between different asynchronous composed into two sequentially invoked fork-join sub-transactions,
function calls. For concreteness, we focus on a subset of programs namely one to calculate total_risk and one to conditionally call
modeled after a fork-join parallelism pattern [35]. add_entry. All the benchmarks evaluated in our experiments can
Fork-Join Sub-Transactions. We call a sub-transaction fork-join also be expressed with fork-join sub-transactions. Notwithstanding,
if it comprises: (a) sequential logic, potentially involving synchro- the reactor programming model allows for any synchronization pat-
nous calls to sub-transactions; (b) parallel logic consisting of calls to tern with futures respecting the conditions of Section 2.2.4, and is
children fork-join sub-transactions such that all asynchronous in- not limited to fork-join sub-transactions. The cost model of Figure 3
vocations happen simultaneously at one given program point only, can be extended to cover other program classes, if necessary.
are potentially overlapped with synchronous logic, and then all fu- We envision that developers may employ cost modeling as in Fig-
ture results are collected. Consider one such sub-transaction STi,k j . ure 3 in a way similar to algorithmic complexity measures, e.g., [1],
to compare alternative formulations for transaction programs. De-
We call syncseq (STi,k j ) its sequence of children sub-transactions velopers can improve the latency of their sub-transactions by: (1) in-
and Pseq (STi,k j ) its processing logic executed sequentially. To cap- creasing asynchronicity of children sub-transactions, (2) overlap-
ture communication costs, we term Cs (k, k ′ ) the cost to send a ping execution of application logic by introducing sub-transactions,
sub-transaction call from reactor k to reactor k ′ , and Cr (k ′, k ) the and (3) reducing the processing cost of the application logic.
As shown in Figure 4, ReactDB’s architecture is organized as a
collection of database containers. A container abstracts a (portion
of a) machine with its own storage (main memory) and associated
mechanisms for transactional consistency. Each container is iso-
lated and does not share its data with other containers. Containers
are associated with computational resources (cores) disjoint from
other containers, abstracted by transaction executors. A transaction
executor consists of a thread pool and a request queue, and is re-
sponsible for executing requests, namely asynchronous procedure
calls. Each transaction executor is pinned to a core. Single-container
transactions are managed by the concurrency control mechanism
within the container, while a transaction coordinator runs a com-
mitment protocol for transactions spanning multiple containers.
Transactional durability is currently unavailable in our implemen-
tation, but could be achieved by a combination of techniques such
as fast log-based recovery [59] and distributed checkpoints [24].
Alternatively, an approach such as FaRM’s could be employed to
minimize any impact of durability mechanisms on latency [22].
Each container stores a two-level mapping between a reactor
and a transaction executor. On the first level, a reactor is mapped to
Figure 4: ReactDB’s architecture. one and only one container. Together with appropriate container
deployment, this constraint ensures that asymmetrically large com-
Finally, developers can reason about scalability by considering munication costs are only introduced between, but not within,
how data, computation and communication are spread among reac- reactors, in line with our computational cost model. On the second
tors. In particular, if developers architect their applications such that level, a reactor can be mapped to one or more transaction executors
increasing amounts of data and computation are distributed among in a container. Transaction routers decide the transaction executor
increasing numbers of reactors while at the same time keeping that should run a transaction or sub-transaction according to a
the number of cross-reactor calls roughly constant per transaction, given policy, e.g., round-robin or affinity-based.
then adequate transactional scalability should be expected. Transport drivers handle communication across containers. Re-
Limitations. Again similar to algorithmic complexity measures, a actDB has a driver component that is used by client code to send
cost model such as the one in Figure 3 is exposed to limitations in the transactions into the system for processing. ReactDB accepts pre-
estimation of its various parameters for concrete fork-join transac- compiled stored procedures written in the reactor programming
tion programs and system realizations of the reactor model. It may model in C++ against a record manager interface. An instance of a
be impossible to estimate reliably how costly processing will be or pre-compiled stored procedure and its inputs forms a transaction.
how many sub-transaction invocations will fall into syncseq (STi,k j ),
syncovp (STi,k j ), and async(STi,k j ), as these parameters may be deter- 3.2 Concurrency Control
mined by complex program logic and data dependencies. In addition, 3.2.1 Single Container Transactions. Every transaction or sub-
the concrete system realization may not express the full parallelism transaction written in the reactor programming model specifies
encoded in the program. Moreover, determination of cost model the reactor where it must be executed. If the destination reactor
parameters such as Cs and Cr may be compromised by measure- of a child sub-transaction is hosted in the same container as the
ment errors. Finally, the cost model considers transaction programs parent sub-transaction, the child sub-transaction is executed syn-
in isolation, and does not include interference or queueing effects. chronously within the same transaction executor to minimize the
Notwithstanding, we remark in our experimental evaluation that communication overhead of migrating across transaction execu-
under certain conditions, the cost model can closely capture latency tors. If all the sub-transactions in the execution context of a root
differences between alternative transaction programs. transaction are executed within one container, then the native con-
currency control mechanism of the container is used to guarantee
3 SYSTEM ARCHITECTURE serializability. As a consequence of Theorem 2.7, ReactDB can
3.1 Overview reuse an existing concurrency control mechanism, and we chose
Silo’s high-performance OCC implementation [52].
In this section, we discuss the architecture of ReactDB, an in-
memory database system that exposes the reactor programming 3.2.2 Multi-Container Transactions. When a sub-transaction
model. The design of ReactDB aims at providing control over the is invoked on a reactor mapped to a container different than the
mapping of reactors to physical computational resources and mem- current container, the call is routed by the transport driver to the
ory regions under concurrency control. The system implementation destination container and then by the transaction router to the
currently targets a single multi-core machine for deployment; how- request queue of a transaction executor. Once the sub-transaction
ever, ReactDB’s architecture is designed to allow for deployments is queued, the calling code gets a future back representing this com-
in a cluster of machines, which we leave for future work. putation. If the calling sub-transaction code does not synchronize
on the future, then once the caller completes, ReactDB enforces option, sub-transactions are invoked synchronously by calling get
synchronization on the futures of all child sub-transactions. By the on the sub-transaction’s future immediately after invocation. In
above synchronization policy, a root transaction can finish when the latter, the call to get is delayed as much as possible for max-
all the sub-transactions created and invoked in its context finish, imal overlapping of application logic with sub-transaction calls.
recursively. The transaction executor then invokes the transaction From an architecture perspective, both of these setups represent a
coordinator to initiate a commitment protocol across the containers shared-nothing deployment with differing application programs
that have been touched by the transaction, either directly or by any exercising different synchronization options. The shared-nothing-
of its nested sub-transactions. The transaction coordinator in turn sync strategy models the setup of shared-nothing databases such
performs a 2PC protocol. The first phase of the protocol triggers as H-Store [50] and HyPer [30], albeit with a different concurrency
validation of Silo’s OCC protocol on all the involved containers, control protocol due to potentially higher MPL per executor. The
during which locks are acquired on the write-set of the transaction. shared-nothing-async strategy allows ReactDB to further leverage
If any of the validations fail, the second phase of the protocol en- intra-transaction parallelism as provided by the reactor program-
sures that the transaction is aborted on all containers. Otherwise, ming model, exploiting cooperative multitasking.
the write phase of Silo’s OCC scheme is triggered. In either case, Other flexible deployments, similar to [44], are possible as well.
all locks are released appropriately [52]. To change database architecture, only configuration files need to be
3.2.3 Thread Management. To minimize the effect of stalls due edited and the system bootstrapped, without changes to application
to synchronization, each transaction executor maintains a thread logic operating on reactors.
pool to process (sub-)transactions. A configurable number of threads
are allowed to become active and drain the transactor executor’s 4 EVALUATION
request queue, thus controlling the multi-programming level (MPL) In this section, we evaluate the effectiveness of ReactDB and the
per executor. In addition, the threads use cooperative multitask- reactor programming model. The experiments broadly aim at vali-
ing to minimize context switching overheads. A thread blocks if it dating the following hypotheses: (H1) The reactor programming
tries to access the result of a sub-transaction invoked on a different model allows for reasoning about latency in alternative formula-
container and the result is not yet available. In such a situation, it tions of application programs (Section 4.2.1). (H2) The computa-
notifies another thread to take over processing of the request queue tional cost model of reactors can be efficiently realized by ReactDB,
and goes back to the thread pool when the (sub-)transaction being subject to its limitations (Section 4.2.2 and Appendix B). (H3) Re-
executed by it is completed. actDB allows for configuration of database architecture, without
any changes to application code, so as to exploit asynchronicity in
3.3 Deployments transactions depending on the level of load imposed on the data-
Configuration of transaction executors and containers allows in- base (Section 4.3 and Appendices C and D). We present additional
frastructure engineers to flexibly deploy ReactDB in a number of evidence of ReactDB’s transactional scale-up capability in Appen-
database architectures. In the remainder of the paper, we restrict dix E. Furthermore, we evaluate procedure-level parallelism with
ourselves to three main deployment strategies: reactors based on the scenario of Figure 1 in Appendix F.
(S1) shared-everything-without-affinity: This strategy employs a
single container in which each transaction executor can handle 4.1 Experimental Setup
transactions on behalf of any reactor. ReactDB is configured with Hardware. For our latency measurements in Section 4.2 and Ap-
a round-robin router to load balance transactions among executors. pendix B, we employ a machine with one four-core, 3.6 GHz Intel
All sub-transactions are executed within the same transaction ex- Xeon E3-1276 processor with hyperthreading, leading to a total of
ecutor to avoid any migration of control overhead. This strategy ad- eight hardware threads. Each physical core has a private 32 KB L1
heres to the architecture of most shared-everything databases [28]. cache and a private 256 KB L2 cache. All the cores share a last-level
(S2) shared-everything-with-affinity: This strategy is similar to L3 cache of 8 MB. The machine has 32 GB of RAM and runs 64-bit
shared-everything-without-affinity in that it employs a single con- Linux 4.1.2. A machine with high clock frequency and uniform
tainer, but with the difference that an affinity-based router ensures memory access was chosen for these experiments to challenge our
that root transactions for a given reactor are processed by the same system’s ability to reflect low-level latency asymmetries in modern
transaction executor. In sub-transaction calls, even if to different hardware as captured by our programming model.
reactors, no migration of control happens, and the sub-transaction For Section 4.3 and Appendices C–F, we use a machine with two
is executed by the same transaction executor of the root transaction. sockets, each with 8-core 2.1 GHz AMD Opteron 6274 processors
This deployment strategy closely adheres to the setup employed in including two physical threads per core, leading to a total of 32
the evaluation of Silo [52]. hardware threads. Each physical thread has a private 16 KB L1
(S3) shared-nothing: This strategy employs as many containers data cache. Each physical core has a private 64 KB L1 instruction
as transaction executors, and a given reactor is mapped to exactly cache and a 2 MB L2 cache. Each of the two sockets has a 6 MB
one transaction executor. While this strategy aims at maximizing L3 cache. The machine has 125 GB of RAM in total, with half the
program-to-data affinity, sub-transaction calls to different reactors memory attached to each of the two sockets, and runs 64-bit Linux
may imply migration of control overheads to other transaction 4.1.15. The higher number of hardware threads and accentuated
executors. We further decompose this configuration into shared- cache coherence and cross-core synchronization effects allow us to
nothing-sync and shared-nothing-async, depending on how sub- demonstrate the effect of virtualization of database architecture in
transactions are invoked within application programs. In the former experiments varying transaction load and asynchrony.
Methodology. An epoch-based measurement approach similar to writes in the processing logic while still executing communication
Oltpbench is used [21]. Average latency or throughput is calculated proportional to the transaction size sequentially. The fully-async
across 50 epochs and the standard deviation is plotted in error bars. formulation does not invoke transfer sub-transactions, but rather
All measurements include the time to generate transaction inputs. explicitly invokes asynchronous credit sub-transactions on the des-
Workloads and Deployments. For the experiments of Section 4.2, tination accounts and multiple synchronous debit sub-transaction
we implement an extended version of the Smallbank benchmark on the source account. Thus, not only are roughly half of the writes
mix [3]. Smallbank simulates a banking application where cus- overlapped, but also a substantial part of the communication across
tomers access their savings and checking accounts. Oltpbench first reactors. The final formulation, opt, is similar to the fully-async
extended this benchmark with a transfer transaction, which is im- transaction, but performs a single synchronous debit to the source
plemented by a credit to a destination account and a debit from account for the full amount instead of multiple debits. Consequently,
a source account [26]. We extend the benchmark further with a processing depth is further reduced and should roughly equal two
multi-transfer transaction. Multi-transfer simulates a group-based writes, while communication should be largely overlapped.
transfer, i.e., multiple transfers from the same source to multiple In addition, we implement all transactions of TPC-C in our pro-
destinations. Thus, by varying the number of destination accounts gramming model. Unless otherwise stated, we overlap calls between
for multi-transfer and controlling the deployment of ReactDB, we reactors as much as possible in transaction logic by invoking sub-
can vary both the amount of processing in the transaction as well transactions across reactors asynchronously.
as the amount of cross-reactor accesses that the transaction makes.
Each customer is modeled as a reactor. We configure ReactDB 4.2 Latency Control
with 7 database containers, each hosting a single transaction ex- 4.2.1 Latency vs. Program Formulations. In this section, we show
ecutor for a total of 7 transaction executors mapped to 7 hardware an experiment in which we vary the size of a multi-transfer trans-
threads. The deployment plan of ReactDB is configured so that action by increasing the number of destination accounts. Each
each container holds a range of 1000 reactors. A single worker destination is chosen on a different container out of the seven in
thread is employed to eliminate interference effects and allow us our shared-nothing deployment. The latency for the different appli-
to measure latency overheads of single transactions. The worker cation program formulations is outlined in Figure 5. The observed
thread generating transaction inputs and invocations is allocated curves match the trends predicted by the cost equation of Figure 3.
in a separate worker container and pinned to the same physical First, as we increase transaction size, the processing and communi-
core hosting the container responsible for the first range, but in cation costs of a multi-transfer increase linearly across all formula-
a separate hardware thread. In order to keep our measurements tions. Second, the highest latencies overall are for fully-sync, and
comparable, the multi-transfer transaction input generator always latencies become lower as more asynchronicity is introduced in the
chooses a source customer account from this first container. formulations by overlapping sub-transaction execution. Third, there
The experiments of Section 4.3 use the classic TPC-C bench- is a substantial gap between partially-async and fully-async, due
mark [51]. We closely follow the implementation of the benchmark to asymmetric costs between receiving procedure results and send-
from Oltpbench [26], which makes the usual simplifications, e.g., ing procedure invocations to other reactors. The latter manifests
regarding think times. In our port of TPC-C, we model each ware- because of thread switching costs across cores in the receive code
house as a reactor, and configure database containers differently path, as opposed to atomic operations in the send code path. In opt,
according to the experiment. We vary the number of client worker latency is further reduced when compared to fully-async by cutting
threads generating transaction invocations, and group these work- the processing costs almost in half, which have a smaller impact
ers into a worker container separate from the database containers than communication across cores. It is interesting to note that these
that host transaction executors and carry out transaction logic. optimizations can be done on the µsec scale. The programming
Each client worker thread generates load for only one warehouse model allows a developer to reduce the latency of a transaction
(reactor), thus modeling client affinity to a warehouse. To showcase from 86 µsec to 25 µsec by simple program reformulations without
ReactDB’s ability to configure database architecture at deployment compromising consistency.
time, we experiment with the deployments described in Section 3.3.
Application Programs. We evaluate different application pro- 4.2.2 Cost Model Breakdown. In this section, we break down
gram formulations for the multi-transfer transaction added to Small- our measurements of transaction latencies by the cost components
bank, exercising the asynchronous programming features of reac- in Figure 3, and further validate that our cost model is realizable in
tors. Similar to Figure 1(b), multi-transfer invokes multiple sub- ReactDB. We focus on the fully-sync and opt multi-transfer for-
transactions. In contrast to the figure, in some program variants, mulations described above, and vary the size of the multi-transfer
we force synchronous execution by immediately calling get on the transaction by changing the number of destination accounts simi-
future returned. The first formulation, fully-sync, invokes multi- larly to Figure 5. For each variant, we profiled the execution time of
ple transfer sub-transactions from the same source synchronously. the programs in ReactDB into the components of the cost model
Each transfer sub-transaction in turn invokes a synchronous credit of Figure 3. In addition, we used the profiling information from
sub-transaction on the destination account and a synchronous debit fully-sync for a transaction size of one to calibrate the parameters
sub-transaction on the source account. The partially-async formu- of the cost model for prediction, including processing and commu-
lation behaves similarly; however, each transfer sub-transaction nication costs. From the parameter values for the single-transfer
invokes an asynchronous credit on the destination account and a fully-sync run, we employed the cost equation of Figure 3 to predict
synchronous debit on the source account, overlapping half of the the execution costs for other transaction sizes and for both the
0.12 70
fully-sync sync-execution shared-everything-without-affinity
0.1
partially-async Cs shared-nothing-async
0.1 60
fully-async Cr shared-everything-with-affinity
opt

Throughput [Ktxn/sec]
Avg latency [msec]
0.08 0.08 async-execution 50
Avg latency [msec]

commit+input-gen

0.06 0.06 40

0.04 30
0.04
20
0.02
0.02
10
0 full f o o full f o o full full op op
y-s ully-s pt pt-pr y-s ully-s pt pt-pr y-s y-s t t-p
red
yn yn ed yn yn ed yn yn
c c-p c c-p c c-p
0 red red red 0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 9
1 4 7
Txn size Txn size Workers

Figure 5: Latency vs. size and different Figure 6: Latency breakdown into cost Figure 7: TPC-C throughput with vary-
user program formulations. model components. ing load at scale factor 4.
fully-sync and opt program formulations. The predicted values are In summary, we observe that the latencies of transactions can
labeled fully-sync-pred and opt-pred. be reliably profiled in our system ReactDB into the components
We break down transaction latencies into the following com- of the cost model of Figure 3 even in a challenging scenario where
ponents: (a) sync-execution: the cost of processing the logic in the the cost of the processing logic in the benchmark is extremely
transaction and in synchronous sub-transactions, corresponding to small and comparable to the cost of communication across reactors.
the first two components of the cost equation in Figure 3; (b) Cs and This evidence indicates that it is possible for developers to observe
Cr : the forward and backward costs of communication between and explain the latency behavior of programs with reactors and to
reactors in the third component of the cost equation; (c) async- reformulate their programs for better performance.
execution: the cumulative execution cost of all asynchronous sub-
transactions overlapped with any synchronous sub-transactions
and processing logic, corresponding to the fourth component of the 4.3 Virtualization of Database Architecture
cost equation; (d) commit + input-gen: the cost of the commit proto- 4.3.1 Effect of Load. We evaluate the behavior of the three data-
col, including OCC and 2PC, along with the time to generate the base architecture deployments described in Section 3.3 with the
inputs for the transaction. The latter cost component is not shown standard mix of the TPC-C benchmark under increasing load from
in Figure 3 since it only applies to root transactions and not to any client workers. We fixed the database size to a scale factor of four,
sub-transaction in the reactor programming model. As such, we corresponding to four warehouse reactors. Correspondingly, we
would expect the bulk of the difference between the predicted and deployed four transaction executors in all configurations, but in-
observed performance to be explainable by this cost component. creased the number of workers invoking transactions on the data-
Figure 6 shows that the predicted breakdown for the cost compo- base from 1 to 8. As such, we expect the database to get progressively
nents closely matches the latencies profiled for actual executions in more overloaded from 5 to 8 workers.
ReactDB, even at such a fine granularity. The slight difference in Figures 7 and 8 show the average throughput and latency of
the overlap of different bars is within the variance of the observed vs. successful transactions in this scenario. We observe that shared-
the calibration measurement, and expected especially since calibra- everything-with-affinity outperforms the other deployment con-
tion measures parameters within the 5µsec range. For a transaction figurations, since it exploits memory access affinities and mini-
size of one, we can see that opt has the same performance behavior mizes cross-core communication overheads. Even though shared-
as fully-sync. This effect arises because the destination transaction nothing-async attempts to overlap cross-reactor sub-transactions as
executor for the credit in the transfer is the same as the source trans- much as possible, the relative cost of dispatching sub-transactions
action executor for the debit, resulting in a synchronous execution with limited processing logic makes it more expensive than shared-
of the credit and debit sub-transactions similar to fully-sync. As we everything-with-affinity, which employs direct memory accesses.
increase the transaction size, the number of transaction executors We observe that the differences in throughput between the two
spanned by the transaction increases, and the execution costs of methods are not large between 1 and 4 workers, since there is only
fully-sync grow because of increasing costs in sync-execution, Cs a 1% chance of items for stock update in the new-order transaction
and Cr . We again observe here cost asymmetry between Cs and and a 15% chance of customer lookups in the payment transaction
Cr , arising for the same reasons remarked in Section 4.2.1. For opt, being remote. Since workers employ client affinity to warehouses,
we do not observe any sync-execution costs, since all credit sub- note that for 5 workers, there are two worker threads generating
transactions are overlapped with each other and with the single workload for the first warehouse; for eight workers, there are two
debit on the source reactor. The growth in the async-execution cost worker threads generating workload for every warehouse. So from
of opt with increasing transaction size is caused by the rising com- 5 to 8 workers, the possibility of conflicts between concurrent trans-
munication cost for the sequence of credit sub-transactions, i.e., the actions to the same warehouses arises, especially for the payment
last asynchronous credit sub-transaction incurs a cumulative cost and new-order transactions.
of communication of all asynchronous sub-transactions before it in As expected, therefore, from 1 to 4 workers, abort rates for all de-
the sequence. ployments are negligible and close to nil. Abort rates then go up to
0.4 4000 5
shared-everything-without-affinity shared-nothing-async shared-nothing-async
shared-nothing-async 3500 shared-everything-with-affinity 4.5 shared-everything-with-affinity
0.35
shared-everything-with-affinity
4
3000

Throughput [txn/sec]
0.3
Avg latency [msec]

Avg latency [msec]


3.5
2500
0.25 3
2000
0.2 2.5
1500
2
0.15
1000
1.5
0.1 500 1

0.05 0 0.5
0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
Workers Workers Workers

Figure 8: TPC-C latency with varying Figure 9: Throughput of new-order- Figure 10: Latency of new-order-delay
load at scale factor 4. delay transactions with varying load. transactions with varying load.
4.1% for shared-everything-without-affinity and 5.72% for shared- Moreover, we observe that asynchronicity in transactions engen-
nothing-async for eight workers. Interestingly, shared-everything- ders a trade-off between communication overheads and processing
with-affinity is resilient to this effect, because each transaction costs. We validate this observation by fitting our cost model to the
executor runs a transaction to completion before picking up the TPC-C new-order transaction in Appendix C, and explore asyn-
next one from its queue. In other words, since shared-everything- chronicity trade-offs further in the next section.
with-affinity does not employ asynchronicity, transaction execu- 4.3.2 Asynchronicity Tradeoffs. To additionally drill down on
tor threads never block, and executors can operate with a multi- the potential benefits of asynchronicity under concurrency, we eval-
programming level of one. At the same time, affinity in routing uate in this section the two database architectures shared-nothing-
ensures that each transaction executor services transactions for a async and shared-everything-with-affinity under varying load. We
different warehouse reactor, preserving identical execution behav- control the amount of load by varying the number of workers from
ior to that observed for workers 1 to 4 despite changes in the con- 1 to 8, while keeping the number of warehouses constant at a scale
flict patterns in workload generation. Even though shared-nothing- factor of eight. For clarity, we focus exclusively on new-order trans-
async also ensures affinity in transaction routing, it uses cooperative actions. Each new-order consists of between 5-15 items and we
multi-threading and a higher multi-programming level for greater force each of the items to be drawn from a remote warehouse with
hardware utilization, and ends up executing multiple transactions equal probability. Since the default new-order formulation has lim-
concurrently from a given transaction executor’s queue. ited parallelism in the logic executed at remote warehouses, we
While we increase the number of client workers, we must re- augmented the logic for stock data update with an artificial delay
member that the database system processing resources in the form between 300 and 400 µsec by generating random numbers to model
of transaction executors remain fixed. So the throughput on average stock replenishment calculations. This increases the overall work
increases for all the deployments because the hardware utilization in the transaction without increasing its data footprint and the
goes up. The hardware utilization on each of the transaction ex- contention on the database.
ecutor cores for shared-everything-with-affinity increases from 0 Figures 9 and 10 show the throughput and latency, respectively,
to 83% between 1 and 4 workers, and is pushed further to 99% as of running 100% new-order-delay transactions under increasing
we get to eight workers. Given the use of asynchronicity in shared- load. With one worker, the throughput of shared-nothing-async
nothing-async, this deployment uses all cores from the start, with a is double that of shared-everything-with-affinity. The former ex-
hardware utilization for one worker on the four transaction execu- ecutes all the stock updates across 5-6 remote warehouse asyn-
tor cores of 76%, 2.5%, 2.5% and 2.5%, respectively. The hardware chronously (average distinct remote warehouses chosen from 7
utilization of transaction executor cores with this deployment is using a uniform distribution) fully utilizing the available hardware
uniform at 79% with four workers, and keeps on rising up to 98% for parallelism, while the latter executes the entire transaction logic
eight workers. For shared-everything-without-affinity, the hard- sequentially. Although shared-nothing-async incurs higher com-
ware utilization is uniform throughout, rising from 17% to 66% from munication cost in dispatching the stock updates to be performed
1 to 4 workers and reaching only 84% with eight workers. by different warehouse reactors, the greater amount of work in
We observe that shared-everything-without-affinity exhibits the each stock update makes it worthwhile in comparison to sequen-
worst performance. At one worker, every transaction invocation tial shared memory accesses in shared-everything-with-affinity.
is routed by the round-robin load balancing router to a different Conversely, as we increase the number of workers and thus pres-
transaction executor than the one that processed the last request, sure on resources, the throughput of shared-nothing-async starts
amplifying cross-core communication. As workers are added this growing less than that of shared-everything-with-affinity. Note
effect diminishes; however, the lower hardware utilization and the that the abort rate for the deployments was negligible (0.03-0.07%),
eventual rise in abort rates limits the efficiency of this architecture. highlighting the limited amount of contention on actual items.
In short, we remark that the capability to virtualize database ar- In summary, these results suggest that the most effective data-
chitecture allows ReactDB to be configured to maximize hardware base architecture may change depending on load conditions when
utilization and minimize conflicts in a standard OLTP benchmark. asynchronicity can be exploited by transaction code. Under high
load, shared-everything-with-affinity exhibits the best performance
among the architectures evaluated, since it reduces overhead at the database system, i.e., enriching the data tier itself. Additionally,
expense of not utilizing at all intra-transaction parallelism. On the ReactDB comprises building a high-performance, scalable, multi-
other hand, when load conditions are light to normal and when core OLTP system with an actor-oriented programming model and
transaction logic comprises enough parallelism, shared-nothing- latency control, which is not the target design and feature set of
async can achieve substantially higher throughput and lower la- the vision for Orleans [9].
tency. To further validate these observations, we evaluate in Ap- As explained in Section 2.2, reactors are related to the early work
pendix D the effects of varying cross-reactor accesses in the TPC-C on Argus [36] because of the asynchronous transactional program-
benchmark under conditions of high load. ming model supporting nested function calls; however, the reactor
programming model is substantially different from that of Argus.
5 RELATED WORK First, the use of a relational data and query model is a central idea
In-memory OLTP Databases. H-Store [50] and HyPer [30] fol- of reactors, but not of Argus. Note that the latter is not a simple
low an extreme shared-nothing design by having single-threaded restriction of the former, because the programming issues handled
execution engines responsible for each data partition. As a result, by a relational abstraction, e.g., physical data independence, would
single-partition transactions are extremely fast, but multi-partition need to be coded from scratch at a very low level in Argus. Second,
transactions and skew greatly affect system throughput. LADS [56] user-defined logical actors are a central idea of reactors, but not of
improves upon this limitation by merging transaction logic and Argus. Guardians in Argus employ multiple concurrent processes
eliminating multi-partition synchronization through dynamic anal- explicitly, while reactors abstract away the mapping to threads in
ysis of batches of specific transaction classes. In contrast to these ReactDB. Third, reasoning about latency from the programming
shared-nothing engines, shared-everything lock-based OLTP sys- model is a central idea of reactors, but again not of Argus. Even
tems specifically designed for multi-cores, such as DORA [42] and though Argus has low-level asynchronous calls, it lacks an explicit
PLP [43], advocate partitioning of internal engine data structures cost model of synchronous and asynchronous communication. On
for scalability. Orthrus [47] partitions only the lock manager and the system implementation level, ReactDB is an OLTP database
utilizes a message-passing design for lock acquisition to reduce system designed for low-overhead virtualization of database archi-
lock contention across multiple cores for contended workloads. tecture, which was never the focus of Argus. These differences to
In contrast to the baked-in architectural approach of earlier Argus also distinguish our work from a large class of object-oriented
engines, ReactDB borrows the highly-scalable OCC implementa- distributed computing and operating systems [11, 14, 18, 33, 41].
tion of Silo [52], building on top of it a virtualization layer that Database Virtualization. Virtualization of database engines for
allows for flexible architectural deployments, e.g., as a classic shared- cloud computing has focused on particular target database archi-
everything engine, a shared-nothing engine, or an affinity-based tectures, e.g., shared-nothing databases with transactional support
shared-everything engine. In addition, ReactDB is not restricted to only within partitions [8] or distributed control architectures with
specific transaction classes, supporting transactions with, e.g., user- weaker consistency guarantees [31]. By contrast, ReactDB offers
defined aborts, conditionals, and range queries. Finally, ReactDB infrastructure engineers the possibility to configure database archi-
is the first engine realizing the programming model of reactors. tecture itself by containerization, while maintaining a high degree
Transactional Partitioned Data Stores. A class of systems pro- of transaction isolation. Our results support recent observations
vides transactional support over key-value stores as long as keys are of low overhead of use of container mechanisms together with an
co-located in the same machine or key group [16, 17]. Warp [25], in in-memory database [39], while showing that even more flexibility
contrast, provides full transaction support with nested transactions, in database architecture can be achieved at negligible cost.
but limits query capabilities, e.g., no predicate reads are provided The design of ReactDB is reminiscent of work on OLTP on hard-
nor relational query support. The limited transactional support and ware islands [44] and on the cost of synchronization primitives [19].
low-level storage-based programming model make it difficult to Complementary to recent work on compiler optimizations [46], re-
express applications as opposed to the reactor programming model, actors can be seen as a step towards improving programmability
which provides serializable transactions with relational query ca- and support for stored procedures in database systems.
pabilities. Recent work has also focused on enhancing concurrency
through static analysis of transaction programs [38, 58]. The latter 6 CONCLUSION
could be assimilated in the implementation of ReactDB’s concur- In this paper, we introduced reactors, a new relational abstraction
rency control layers as future work. for in-memory OLTP databases. Reactors comprise an asynchro-
Asynchronous Programming. As mentioned previously, reac- nous programming model that allows encoding of intra-transaction
tors are a novel restructuring in the context of databases of the parallelism in database stored procedures while maintaining seri-
actor model [2]. In contrast to regular actors, reactors comprise alizability. We presented the design of ReactDB, the first imple-
an explicit memory model with transactions and relational query- mentation of reactors. ReactDB enables flexible and controllable
ing, substantially simplifying program logic. These features make database architecture configuration at deployment time. Reactors
the reactor model differ significantly from the virtual actors of Or- open up a variety of directions for future work, ranging from reac-
leans [7] and from other actor-based frameworks [4, 27]. Recent tor database modeling to efficient mapping of reactors to distributed
work in Orleans has focused on a vision of integrating traditional hardware architectures.
data-management functionality in a virtual actor runtime for the Acknowledgments. We thank the anonymous reviewers for their
middle tier of a classic three-tier architecture [9, 23]. This approach insightful comments and Phil Bernstein for discussions and sugges-
is complementary to our work of integrating actor features in a tions, which have helped us improve this paper.
REFERENCES [31] Donald Kossmann, Tim Kraska, and Simon Loesing. 2010. An evaluation of
[1] Alok Aggarwal and Jeffrey Scott Vitter. 1988. The Input/Output Complexity of alternative architectures for transaction processing in the cloud. In Proc. ACM
Sorting and Related Problems. Commun. ACM 31, 9 (1988), 1116–1127. SIGMOD. 579–590.
[2] Gul Agha. 1986. Actors: A Model of Concurrent Computation in Distributed Systems. [32] Per-Åke Larson (Ed.). 2013. IEEE Data Engineering Bulletin: Special Issue on
MIT Press, Cambridge, MA, USA. Main-Memory Database Systems. Vol. 36, number 2.
[3] Mohammad Alomari, Michael J. Cahill, Alan Fekete, and Uwe Röhm. 2008. The [33] Edward D. Lazowska, Henry M. Levy, Guy T. Almes, Michael J. Fischer, Robert J.
Cost of Serializability on Platforms That Use Snapshot Isolation. In Proc. ICDE. Fowler, and Stephen C. Vestal. 1981. The Architecture of the Eden System. In
576–585. Proc. SOSP. 148–159.
[4] Joe Armstrong. 2010. Erlang. Commun. ACM 53, 9 (2010), 68–75. [34] Christophe Lécluse and Philippe Richard. 1989. The O2 Database Programming
[5] Subi Arumugam, Ravi Jampani, Luis Leopoldo Perez, Fei Xu, Christopher M. Language. In Proc. VLDB. 411–422.
Jermaine, and Peter J. Haas. 2010. MCDB-R: Risk Analysis in the Database. [35] Daan Leijen, Wolfram Schulte, and Sebastian Burckhardt. 2009. The design of a
PVLDB 3, 1 (2010), 782–793. task parallel library. In Proc. OOPSLA. 227–242.
[6] Catriel Beeri, Philip A. Bernstein, and Nathan Goodman. 1989. A Model for [36] Barbara Liskov, Dorothy Curtis, Paul Johnson, and Robert Scheifler. 1987. Imple-
Concurrency in Nested Transactions Systems. J. ACM 36, 2 (1989), 230–269. mentation of Argus. In Proc. SOSP. 111–122.
[7] Philip Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, and Jorgen Thelin. [37] B. Liskov and L. Shrira. 1988. Promises: Linguistic Support for Efficient Asyn-
2014. Orleans: Distributed Virtual Actors for Programmability and Scalability. chronous Procedure Calls in Distributed Systems. In Proc. PLDI. 260–267.
Technical Report MSR-TR-2014-41. Microsoft Research. [38] Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, and Jinyang Li. 2014. Extracting
[8] Philip A. Bernstein, Istvan Cseri, Nishant Dani, Nigel Ellis, Ajay Kalhan, Gopal More Concurrency from Distributed Transactions. In Proc. OSDI. 479–494.
Kakivaya, David B. Lomet, Ramesh Manne, Lev Novik, and Tomas Talius. 2011. [39] Tobias Mühlbauer, Wolf Rödiger, Andreas Kipf, Alfons Kemper, and Thomas
Adapting microsoft SQL server for cloud computing. In Proc. ICDE. 1255–1263. Neumann. 2015. High-Performance Main-Memory Database Systems and Modern
[9] Philip A. Bernstein, Mohammad Dashti, Tim Kiefer, and David Maier. 2017. Virtualization: Friends or Foes?. In Proc. DanaC. 4:1–4:4.
Indexing in an Actor-Oriented Database. In Proc. CIDR. [40] Vivek R. Narasayya, Sudipto Das, Manoj Syamala, Badrish Chandramouli, and
[10] Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1987. Concurrency Surajit Chaudhuri. 2013. SQLVM: Performance Isolation in Multi-Tenant Rela-
Control and Recovery in Database Systems. Addison-Wesley. tional Database-as-a-Service. In Proc. CIDR.
[11] Kenneth P. Birman. 1985. Replication and Fault-Tolerance in the ISIS System. In [41] Howard Oakley. 1989. Mercury: an operating system for medium-grained par-
Proc. SOSP. 79–86. allelism. Microprocessors and Microsystems - Embedded Hardware Design 13, 2
[12] Andrew Brook. 2015. Low-latency distributed applications in finance. Commun. (1989), 97–102.
ACM 58, 7 (2015), 42–50. [42] Ippokratis Pandis, Ryan Johnson, Nikos Hardavellas, and Anastasia Ailamaki.
[13] Michael J. Carey, David J. DeWitt, and Scott L. Vandenberg. 1988. A Data Model 2010. Data-oriented Transaction Execution. PVLDB 3, 1-2 (2010), 928–939.
and Query Language for EXODUS. In Proc. ACM SIGMOD. 413–423. [43] Ippokratis Pandis, Pinar Tözün, Ryan Johnson, and Anastasia Ailamaki. 2011.
[14] Panos K. Chrysanthis, Krithi Ramamritham, David W. Stemple, and Stephen PLP: Page Latch-free Shared-everything OLTP. PVLDB 4, 10 (2011), 610–621.
Vinter. 1986. The Gutenberg Operating System Kernel. In Proc. FJCC. 1159–1167. [44] Danica Porobic, Ippokratis Pandis, Miguel Branco, Pinar Tözün, and Anastasia
[15] Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Ailamaki. 2012. OLTP on Hardware Islands. PVLDB 5, 11 (2012), 1447–1458.
Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proc. ACM SoCC. [45] PostgreSQL Documentation 2017. Parallel Safety in PostgreSQL 10 (Section 15.4).
143–154. (October 2017). [Link]
[16] Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. 2010. G-Store: a scalable [46] Karthik Ramachandra, Kwanghyun Park, K. Venkatesh Emani, Alan Halverson,
data store for transactional multi key access in the cloud. In Proc. ACM SoCC. César A. Galindo-Legaria, and Conor Cunningham. 2017. Optimization of Imper-
163–174. ative Programs in a Relational Database. PVLDB 11, 4 (2017), 432–444.
[17] Sudipto Das, Amr El Abbadi, and Divyakant Agrawal. 2009. ElasTraS: An Elas- [47] Kun Ren, Jose M. Faleiro, and Daniel J. Abadi. 2016. Design Principles for Scaling
tic Transactional Data Store in the Cloud. In Workshop on Hot Topics in Cloud Multi-core OLTP Under High Contention. In Proc. ACM SIGMOD. 1583–1598.
Computing, HotCloud. [48] R. Shoup and D. Pritchett. 2006. The eBay architecture. In SD Forum.
[18] Partha Dasgupta, Richard J. LeBlanc, and William F. Appelbe. 1988. The Clouds [49] Michael Stonebraker. 2012. New opportunities for New SQL. Commun. ACM 55,
Distributed Operating System. In Proc. ICDCS. 2–9. 11 (2012), 10–11.
[19] Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2013. Everything You [50] Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos,
Always Wanted to Know About Synchronization but Were Afraid to Ask. In Proc. Nabil Hachem, and Pat Helland. 2007. The End of an Architectural Era: (It’s Time
SOSP. 33–48. for a Complete Rewrite). In Proc. VLDB. 1150–1160.
[20] Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, [51] TPCC 2017. The TPC-C Benchmark. (May 2017). [Link]
Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL Server’s [52] Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden.
Memory-optimized OLTP Engine. In Proc. ACM SIGMOD. 1243–1254. 2013. Speedy Transactions in Multicore In-memory Databases. In Proc. SOSP.
[21] Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudré- 18–32.
Mauroux. 2013. OLTP-Bench: An Extensible Testbed for Benchmarking Relational [53] Pattishall. D. V. 2007. Federation at Flickr (doing billions of queries per day). In
Databases. PVLDB 7, 4 (2013), 277–288. MySQL Conference.
[22] Aleksandar Dragojević, Dushyanth Narayanan, Edmund B. Nightingale, Matthew [54] Gerhard Weikum and Gottfried Vossen. 2002. Transactional Information Systems:
Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No Compro- Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Morgan
mises: Distributed Transactions with Consistency, Availability, and Performance. Kaufmann.
In Proc. SOSP. 54–70. [55] Walker M. White, Christoph Koch, Nitin Gupta, Johannes Gehrke, and Alan J.
[23] Tamer Eldeeb and Phil Bernstein. 2016. Transactions for Distributed Demers. 2007. Database research opportunities in computer games. SIGMOD
Actors in the Cloud. Technical Report MSR-TR-2016-1001. Microsoft Record 36, 3 (2007), 7–13.
Research. [Link] [56] Chang Yao, Divyakant Agrawal, Gang Chen, Qian Lin, Beng Chin Ooi, Weng-Fai
transactions-distributed-actors-cloud-2/ Wong, and Meihui Zhang. 2016. Exploiting Single-Threaded Model in Multi-Core
[24] E. N. Elnozahy, Lorenzo Alvisi, Yi-Min Wang, and David B. Johnson. 2002. A In-Memory Systems. IEEE Trans. Knowl. Data Eng. 28, 10 (2016), 2635–2650.
survey of rollback-recovery protocols in message-passing systems. ACM Comput. [57] Xiangyao Yu, George Bezerra, Andrew Pavlo, Srinivas Devadas, and Michael
Surv. 34, 3 (2002), 375–408. Stonebraker. 2014. Staring into the Abyss: An Evaluation of Concurrency Control
[25] Robert Escriva, Bernard Wong, and Emin Gün Sirer. 2015. Warp: Lightweight with One Thousand Cores. PVLDB 8, 3 (2014), 209–220.
Multi-Key Transactions for Key-Value Stores. CoRR abs/1509.07815 (2015). [58] Yang Zhang, Russell Power, Siyuan Zhou, Yair Sovran, Marcos K. Aguilera, and
[26] H-Store 2017. H-Store Supported Benchmarks (Smallbank). (May 2017). http: Jinyang Li. 2013. Transaction chains: achieving serializability with low latency
//[Link]/documentation/deployment/benchmarks/ in geo-distributed storage systems. In Proc. SOSP. 276–291.
[27] Yaroslav Hayduk, Anita Sobe, and Pascal Felber. 2015. Dynamic Message Pro- [59] Wenting Zheng, Stephen Tu, Eddie Kohler, and Barbara Liskov. 2014. Fast
cessing and Transactional Memory in the Actor Model. In Proc. DAIS. 94–107. Databases with Fast Durability and Recovery Through Multicore Parallelism. In
[28] Joseph Hellerstein, Michael Stonebraker, and James Hamilton. 2007. Architecture Proc. OSDI. 465–477.
of a Database System. Foundations and Trends in Databases 1, 2 (2007), 141–259.
[29] Jiamin Huang, Barzan Mozafari, Grant Schoenebeck, and Thomas F. Wenisch. A PROOF OF THEOREM 2.7
2017. A Top-Down Approach to Achieving Performance Predictability in Data-
base Systems. In Proc. ACM SIGMOD. 745–758.
Let us assume H is serializable and H ′ is not serializable. From the
[30] Alfons Kemper and Thomas Neumann. 2011. HyPer: A Hybrid OLTP&OLAP serializability theorem, since H is serializable, the serializability
Main Memory Database System Based on Virtual Memory Snapshots. In Proc. graph of H (SG (H )) is acyclic; since the projected history H ′ is
ICDE. 195–206.
not serializable, the serializability graph SG (H ′ ) must be cyclic.
Therefore, there must exist a cycle Ti′ → . . . → T j′ → . . . → Ti′ . Cross 1 worker 4 workers
Reactor Latency Latency
Since the graph is built on operations of the classic transactional TPS TPS
Access msec msec
model, then there must be conflicting operations oi′ < P H . . . < P H
% Obs Pred Pred+C+I Obs Obs Obs
o j′ < P H . . . <P H oi′ . By condition (3) of Definition 2.6, there must 1 6921 0.131 0.148 0.144 27091 0.148
k ∈ T and ST
exist sub-transactions STi,l k′ k <
∈ T j such that STi,l
i j,l ′ H 100 5246 0.159 0.189 0.191 14485 0.277
k < . . . < ST k . As a result, SG (H ) must be cyclic,

. . . <H ST j,l Table 1: TPC-C new-order performance at scale factor 4.
′ H H i,l
and we arrive at a contradiction. To show the reverse direction, it
keys for multi_update, for different number of workers along
is simple to follow a similar argument, but starting with a cyclic
with the cost model predictions. For one worker, the observed
graph in SG(H) and showing that SG(H’) must be cyclic as well in
average latency decreases as skew increases, given that more of the
contradiction.
sub-transactions invoked in multi_update become synchronous.
B EFFECT OF SKEW AND QUEUING ON This effect arises since the communication overhead to dispatch
LATENCY a sub-transaction to a remote reactor is greater than the time to
In this appendix, we drill down experimentally into the limitations process a single update. This trend is also captured in our cost
discussed for our cost model in Section 2.4, evaluating a scenario model prediction, 1 worker pred, which shows decreasing average
with queuing delays and skew. latency as the zipfian constant increases to 0.99. Interestingly, the
Setup. We created a separate setup to evaluate limitations in ex- curve predicts that the average latency should increase after 0.99,
planatory ability of our cost model by augmenting the YCSB bench- which conflicts with our observation. This is because the cost of
mark [15] with a multi_update transaction that updates 10 keys. processing actually decreases when all updates are to the same
We model each key as a reactor, and the multi_update trans- key in the transaction. In this case, the read in a read-modify-write
action invokes a read-modify-write update sub-transaction asyn- can just return the previous updated value from the transaction’s
chronously for each key (record size of 100 bytes). Unlike in the write-set instead of accessing the index structure, which was not
setup with Smallbank, we configure ReactDB such that transac- accounted for while calibrating processing cost. In addition to 1
tions may not realize full parallelism at the system level. In particu- worker pred, we also show a curve adding to the prediction the
lar, we chose a scale factor of 4 where each scale factor corresponds measured costs of input generation and commitment. We remind
to 10,000 keys. Four containers are deployed, each with one transac- the reader that the latter two costs are not part of the equation in
tion executor, and assigned 10,000 contiguous reactors correspond- Figure 3, and as in Section 4.2.2, the difference between predicted
ing to the scale factor. Furthermore, we select the reactor where and observed latencies for a single worker are largely accounted
the multi_update transaction is invoked randomly from the set of by these two factors.
10 keys in the update, and choose the keys for multi_update from With four workers, queueing and skew together lead to both
a zipfian distribution. As such, we model program logic where it higher and more variable latencies overall, which are as expected
is not possible to determine statically how many sub-transactions not captured by the cost model. Abort rates also increase, going
will be realized by the system synchronously or asynchronously. from 0.26% to 3.24% as the zipfian constant increases from 0.01 to
However, to ensure that transactions remain fork-join, we sorted 0.99. Hardware utilization on the transaction executor core hosting
the keys to be updated in a multi_update transaction such that the most accessed reactors reflects the trend in the figure, corre-
keys deployed in remote transaction executors precede the ones in sponding to 63%, 92% and 100% at zipfian constants 0.01, 0.5 and
the local transaction executor where the transaction is initiated. Fi- 0.99. At 5.0 skew, a single reactor is accessed, eliminating variability
nally, the experiment is performed both with one and four workers but not queueing.
to simulate absence and presence of queuing delays, respectively.
Results. We employ the multi_update transaction under a 100% C COST MODEL VALIDATION WITH TPC-C
mix. Recall that our setup explained above explicitly restricts the NEW-ORDER
parallelism that can be realized by the system. So to be able to We observed in Section 4.3.1 that shared-nothing-async under-
fit our cost model, we recorded the average sizes of the realized performs compared to shared-everything-with-affinity even with
syncovp (STi,k j ) and async(STi,k j ) sequences with the zipfian distri- a single client worker, despite having more transaction executors
bution employed. Similar to the experiment in Section 4.2.2, we available for transaction processing. To better understand this effect,
calibrated the average communication and processing cost param- we turned to our cost model for validation of a mix consisting of
eters by profiling the multi_update transaction with updates to 100% new-order transactions with four warehouses and four trans-
a single key chosen from a uniform distribution. We emphasize action executors. Similarly to what was done in Section 4.2.2 and
that the cost model abstracts intra-transaction parallelism to aid Appendix B, we calibrated cost model parameters with a new-order
developers in contrasting transaction program formulations, and transaction requesting only one item from a local and one from a
thus does not capture interference among concurrent workers or remote warehouse. Moreover, we recorded the average numbers
queueing effects. As such, we expect cost model predictions to be of synchronous and asynchronous stock-update requests realized
most accurate in configurations where a single worker is deployed, with a single worker, as well as the average numbers of items in-
as in Section 4.2. volved. To explore the impact of communication overheads, we
Figure 11 shows the observed behavior of average latency with evaluated two extremes with respectively 1% and 100% probability
varying skew, as captured by the zipfian constant used to select of cross-reactor accesses in stock updates (see also Appendix D).
0.2 1 180
1 worker obs shared-everything-without-affinity shared-everything-without-affinity
4 workers obs 0.9 shared-nothing-async 160 shared-nothing-async
1 worker pred shared-everything-with-affinity shared-everything-with-affinity
0.8 140
0.15 1 worker pred + input-gen + commit shared-nothing-sync

Throughput [Ktxn/sec]
Avg latency [msec]

Avg latency [msec]


0.7 120
0.6 100
0.1 0.5
80
0.4
60
0.05 0.3
40
0.2
20
0.1
0
0.01 0.5 0.99 5 0 10 20 30 40 50 100 0 2 4 6 8 10 12 14 16
Zipfian constant (skew) % cross reactor transactions Scale factor

Figure 11: Effect of skew and queuing on Figure 12: Latency of cross-reactor TPC- Figure 13: TPC-C throughput with vary-
latency. C new-order (scale factor 8). ing deployments.
Table 1 summarizes the results obtained with one and four work- same latency at 0% cross-reactor transactions as shared-everything-
ers. We obtain excellent fit between the cost model prediction after with-affinity. However, there is a sharp drop in the performance of
including commit and input generation cost (Pred+C+I) and the ob- shared-nothing deployments from 0% to 10% cross-reactor trans-
served latency for one worker under both 1% and 100% cross-reactor actions. This effect is in line with our previous observation that
accesses. We observed that the cost of communication between re- sub-transaction invocations require expensive migration of con-
actors, especially Cr as remarked in Section 4.2.1, is high compared trol in contrast to both shared-everything-without-affinity and
with the processing cost of stock updates. However, the relatively shared-everything-with-affinity. Note that the abort rate for all
small growth in the latency for 100% cross-reactor access with one deployments remained negligible (0.02%-0.04%), highlighting the
worker compared with 1% cross-reactor accesses shows the bene- limited amount of contention on actual items.
fits of overlapping multiple asynchronous sub-transactions across We observe that shared-nothing-async exhibits higher resilience
reactors. With four workers, queueing effects manifest with 100% to increase in cross-reactor transactions when compared with shared-
cross-reactor accesses, which are as discussed in Appendix B not nothing-sync. In particular, latency of shared-nothing-async is bet-
part of the cost model, though prediction is still accurate for 1% ter by roughly a factor of two at 100% cross-reactor transactions.
cross-reactor accesses. This is because shared-nothing-async employs new-order transac-
The hardware utilization with one worker on the four transaction tions with asynchronous sub-transaction invocations on remote
executor cores under 1% cross-reactor accesses is 78%, 2.3%, 2.3%, warehouse reactors, and tries to overlap remote sub-transaction
and 2.3%, respectively, while the values under 100% cross-reactor invocation with execution of logic locally on a warehouse reactor.
accesses are 65%, 24%, 24%, and 24%. The use of four workers in- This demonstrates how application programs can leverage the pro-
creases utilization, which becomes uniform at 82% and 86% under gramming model to engineer application code using reactors with
1% and 100% cross-reactor accesses, respectively. different performance characteristics. At the same time, infrastruc-
ture engineers can select the database architecture that best fits the
D EFFECT OF CROSS-REACTOR execution conditions for the workload without changes to appli-
TRANSACTIONS cation code. In the case of peak load and limited intra-transaction
In this appendix, we evaluate the impact of cross-reactor transac- parallelism, shared-everything-with-affinity turned out to be the
tions on the performance of the different database architectures, best architecture among the ones considered for this scenario, in
complementing the results of Section 4.3. For clarity, we focus line with the results of [52].
exclusively on new-order transactions without any additional com-
putations. To vary the percentage of cross-reactor new-order trans- E SCALE-UP AND OVERHEAD IN REACTDB
actions, we vary the probability that a single item in the transaction This appendix further complements Section 4.3 by presenting trans-
is drawn from a remote warehouse (the remote warehouses are actional scale-up in ReactDB to the degree allowed by our hard-
again chosen with an equal probability). Each new-order consists ware setup and discussing containerization overhead.
of between 5-15 items. Transactional Scale-Up. In this section, we evaluate the scala-
Figure 12 shows the average latencies of running the TPC-C bility of ReactDB across multiple cores for the three database
benchmark with 100% new-order transactions at a scale factor of architecture deployments described in Section 3.3. Figure 13 shows
eight, i.e., eight warehouses receive transactions from eight workers the average transaction throughput of running the TPC-C trans-
at peak load. Due to space constraints, we omit measured through- action mix as we increase the number of warehouses (reactors).
puts. Since ReactDB uses Silo’s OCC protocol and owing to the low While we have also measured latencies, we omit the results due to
contention in TPC-C even upon increasing the number of remote space constraints. Note that we configure the number of transaction
items, we would expect the latency of shared-everything-with- executors to be equal to the scale factor for the experiment.
affinity to be agnostic to changes in the proportion of cross-reactor We observe that the shared-everything-without-affinity deploy-
transaction as per the results in [52]. However, we see a small ment exhibits the worst throughput scalability among the deploy-
gradual increase in the latency for all the deployments. We believe ments selected. This effect is a consequence of shared-everything-
these effects are a consequence of cache coherence and cross-core without-affinity’s poor ability to exploit memory access affinities
communication overheads. We observe further that both shared- within each transaction executor, given round-robin routing of
nothing-sync and shared-nothing-async configurations exhibit the transactions. On the other hand, shared-everything-with-affinity
100
query-parallelism leveraging procedure-level parallelism in the reactor model. In Fig-
procedure-parallelism
sequential ure 1(a), we see the formulation of auth_pay in the classic transac-
80
Avg latency [msec] tional model. If the orders relation were partitioned, the join query
60 between provider and orders would be eligible for parallelization
by a query optimizer in a traditional relational database. Note that
40
the optimizer would consider query-level parallelization, and not
20 a holistic procedure-level parallelization achievable by manual de-
composition of the join in the reactor programming model as shown
0
101 102 103 104 105 106
Random numbers per provider, log scale
in Figure 1(b). In the latter, the potentially expensive computation
Figure 14: Latency of query- vs. procedure-level parallelism. of sim_risk would also be parallelized across reactors.
For this experiment, we configured ReactDB with up to 16 trans-
and shared-nothing-async both take advantage of access affinities
action executors. To focus on intra-transaction parallelism, a single
and behave similarly. We see that shared-everything-with-affinity
worker generates auth_pay transaction invocations targeted at one
is slightly superior to shared-nothing-async. The difference lies in
Exchange and 15 Provider reactors, used to express three program
the relative costs in these deployments of sub-transaction invoca-
execution strategies in ReactDB. Both the sequential and query-
tions vs. direct memory accesses of data for remote warehouses. For
parallelism strategies encode the program of Figure 1(a) for the clas-
a scale factor of one, there are no cross-reactor transactions, and the
sic transactional model. In sequential, we deploy a single container
performance of the two deployments is identical. From a scale fac-
and transaction executor for all reactors in ReactDB, and thus the
tor of two onwards, the probabilities of cross-reactor transactions
whole procedure body executes in a single hardware thread. In
range between 0% to 10%, as discussed in Section 4.3.1. In shared-
query-parallelism, by contrast, we horizontally decompose orders
nothing-async, a sub-transaction call is routed to its corresponding
across the Provider reactors by deploying one container and trans-
transaction executor, incurring context switching and communica-
action executor for each reactor. Similar to a partitioned database,
tion overheads. By contrast, since shared-everything-with-affinity
this results in a parallel foreign key join between providers and
executes the sub-transaction in the same transaction executor, the
orders fragments, but sequential execution of the remainder of the
remote call costs are traded off for the relatively smaller costs of
procedure. Finally, the procedure-parallelism strategy encodes the
cache pressure. We also ran the experiment with all the transaction
program of Figure 1(b). Using the same deployment as for query-
classes in the TPC-C mix invoking sub-transactions synchronously
parallelism, this achieves holistic procedure parallelization in the
in the shared-nothing deployment (shared-nothing-sync configura-
reactor model. The inputs for auth_pay were chosen using uni-
tion described in Section 3.3). However, the throughput and latency
form distributions, and the orders relation was loaded with 30,000
of this configuration was close (within the variance bars) to the
records per provider, mirroring the cardinality of the orders re-
shared-nothing-async configuration because of the low percentage
lation in the TPC-C benchmark. To simulate another transaction
of remote warehouse calls in the default TPC-C mix. We hence omit
settling orders and keep the number of records scanned in the
the curve from Figure 13 for brevity.
orders relation fixed, a pre-configured window of records is re-
In short, the results indicate that ReactDB can be flexibly con-
verse range scanned ordered by time per provider. This window
figured with different database architectures to achieve adequate
parameter has a direct effect on the benefit of query-parallelism
transactional scalability in our available hardware setup for the
compared with sequential. For clarity, we tuned this value to 800
standard TPC-C workload. Further, high affinity of data accesses to
records per provider to ensure that query-parallelism would outper-
physical processing elements (cores) is crucial to performance.
form sequential by 4x when sim_risk is not invoked. We simulated
Containerization Overheads. To account for the overhead of
the computational load of sim_risk by random number generation
containerization, we also ran ReactDB while submitting empty
similar to Section 4.3.2. We also loaded our data values such that
transactions with concurrency control disabled. We observe roughly
sim_risk is always invoked and transactions are not aborted due
constant overhead per transaction invocation across scale factors
to application logic failures.
of around 22 µsec. We measured that thread switching overhead
Figure 14 shows average latencies as we vary the computational
between the worker and transaction executor across different cores
load in sim_risk by generating progressively more random num-
is a largely dominant factor and is dependent on the machine used.
bers. We observe that procedure-parallelism is more resilient to
The overhead accounted for 18% of average TPC-C transaction
changes in computational load and better utilizes the available paral-
latency. When compared with executing the TPC-C application
lel hardware in this benchmark. At the extreme of 106 random num-
code directly within the database kernel without worker threads
bers generated per provider, the latency of procedure-parallelism
generating transaction invocations that are separate from database
is a factor of 8.14x and 8.57x lower than that of query-parallelism
processing threads, as in Silo, the overhead is significant, but if a
and sequential, respectively. The transaction executor core hosting
database engine with kernel thread separation is assumed, as is the
the Exchange reactor becomes overloaded at 100% utilization un-
normal case, the overhead is negligible.
der query-parallelism, while with procedure-parallelism utilization
remains uniform at 84% across executor cores hosting Provider
F PROCEDURE-LEVEL PARALLELISM WITH reactors and 14% in the Exchange executor core. In sequential, the
REACTORS transaction executor mapped to all reactors becomes overloaded at
In this section, we revisit the example introduced in Figure 1 and 100% even when the computational load is zero.
evaluate the potential performance gains that can be obtained by

You might also like