0% found this document useful (0 votes)
148 views46 pages

Graph Databases: A Comprehensive Survey

This document provides a survey and taxonomy of graph database systems. It analyzes their data organization techniques, system designs, and approaches to graph queries. Fifty-one graph database systems are presented and compared, including categories like triple stores, tuple stores, and native graph databases. Key aspects covered include graph data models, data layout strategies, query processing, distribution methods, and research challenges.

Uploaded by

Amna Shahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views46 pages

Graph Databases: A Comprehensive Survey

This document provides a survey and taxonomy of graph database systems. It analyzes their data organization techniques, system designs, and approaches to graph queries. Fifty-one graph database systems are presented and compared, including categories like triple stores, tuple stores, and native graph databases. Key aspects covered include graph data models, data layout strategies, query processing, distribution methods, and research challenges.

Uploaded by

Amna Shahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

1

Demystifying Graph Databases: Analysis and Taxonomy


of Data Organization, System Designs, and Graph Queries

MACIEJ BESTA, Department of Computer Science, ETH Zurich


arXiv:1910.09017v8 [cs.DB] 30 Aug 2023

ROBERT GERSTENBERGER, Department of Computer Science, ETH Zurich


EMANUEL PETER, Department of Computer Science, ETH Zurich
MARC FISCHER, PRODYNA (Schweiz) AG
MICHAŁ PODSTAWSKI, Future Processing
CLAUDE BARTHELS, Department of Computer Science, ETH Zurich
GUSTAVO ALONSO, Department of Computer Science, ETH Zurich
TORSTEN HOEFLER, Department of Computer Science, ETH Zurich

Graph processing has become an important part of multiple areas of computer science, such as machine
learning, computational sciences, medical applications, social network analysis, and many others. Numerous
graphs such as web or social networks may contain up to trillions of edges. Often, these graphs are also
dynamic (their structure changes over time) and have domain-specific rich data associated with vertices and
edges. Graph database systems such as Neo4j enable storing, processing, and analyzing such large, evolving,
and rich datasets. Due to the sheer size of such datasets, combined with the irregular nature of graph processing,
these systems face unique design challenges. To facilitate the understanding of this emerging domain, we
present the first survey and taxonomy of graph database systems. We focus on identifying and analyzing
fundamental categories of these systems (e.g., triple stores, tuple stores, native graph database systems, or
object-oriented systems), the associated graph models (e.g., RDF or Labeled Property Graph), data organization
techniques (e.g., storing graph data in indexing structures or dividing data into records), and different aspects
of data distribution and query execution (e.g., support for sharding and ACID). 51 graph database systems
are presented and compared, including Neo4j, OrientDB, and Virtuoso. We outline graph database queries
and relationships with associated domains (NoSQL stores, graph streaming, and dynamic graph algorithms).
Finally, we describe research and engineering challenges to outline the future of graph databases.
CCS Concepts: • General and reference → Surveys and overviews; • Information systems → Data
management systems; Graph-based database models; Data structures; DBMS engine architectures;
Database query processing; Parallel and distributed DBMSs; Database design and models; Distributed
database transactions; • Theory of computation → Data modeling; Data structures and algorithms for data
management; Distributed algorithms; • Computer systems organization → Distributed architectures;
Additional Key Words and Phrases: Graphs, Graph Databases, NoSQL Stores, Graph Database Management
Systems, Graph Models, Data Layout, Graph Queries, Graph Transactions, Graph Representations, RDF,
Labeled Property Graph, Triple Stores, Key-Value Stores, RDBMS, Wide-Column Stores, Document Stores

1 INTRODUCTION
Graph processing is behind numerous problems in computing, for example in medicine, machine
learning, computational sciences, and others [122, 147]. Graph algorithms are inherently difficult
to design because of challenges such as large sizes of processed graphs, little locality, or irregular
communication [42, 61, 143, 147, 192, 215]. The difficulties are increased by the fact that many
such graphs are also dynamic (their structure changes over time) and have rich data, for example
arbitrary properties or labels, associated with vertices and edges.
Graph databases (GDBs) such as Neo4j [188] emerged to enable storing, processing, and analyzing
large, evolving, and rich graph datasets. Graph databases face unique challenges due to overall
properties of irregular graph computations combined with the demand for low latency and high
throughput of graph queries that can be both local (i.e., accessing or modifying a small part of the
graph, for example a single edge) and global (i.e., accessing or modifying a large part of the graph,
for example all the edges). Many of these challenges belong to the following areas: “general design”
(i.e., what is the most advantageous general structure of a graph database engine), “data models
and organization” (i.e., how to model and store the underlying graph dataset), “data distribution”
(i.e., whether and how to distribute the data across multiple servers), and “transactions and queries”
(i.e., how to query the underlying graph dataset to extract useful information). This distinction
is illustrated in Figure 1. In this work, we present the first survey and taxonomy on these system
aspects of graph databases.
Wide- Key-
RDBMS column value Document Query Data
for graphs stores stores stores (§ 6.5) execution organization
(§ 6.7) (§ 6.6) (§ 6.4) (§ 5.3) (§ 5.2)
Tuple
stores (§ 6.3)
Object- Storage
oriented Design details backend
databases Taxonomy and
of selected graph RDF stores key dimensions (§ 5.1)
(§ 6.8) databases (§ 6) (§ 6.2) (§ 5)

LPG graph
stores (§ 6.9)

Graph Queries and


databases workloads (§ 4)
Related
domains
Compressing covered

...
graph databases in more
detail in
different
Graph databases
vs. other classes of
Non-graph
data models
surveys graph systems (§ 2) Data models (§ 3) (§ 3.4)
History of

...
graph databases
Integrity
constraints
Data models in graph

...
in graph databases
...
databases
This symbol indicates
Conceptual
Graph structure graph data
Query languages that a given category models (§ 3.3)
representation
...
in graph databases Graph

...
analytics
Graph

...
streaming
is surveyed in another
publication (§ 3.2)

Fig. 1. The illustration of the considered areas of graph databases.


In general, we provide the following contributions:
• We provide the first taxonomy of graph databases1 , identifying and analyzing key dimensions
in the design of graph database systems.
• We use our taxonomy to survey, categorize, and compare 51 graph database systems.
1 Listsof graph databases can be found at
http://nosql-database.org
https://database.guide
https://www.g2.com/categories/graph-databases
https://www.predictiveanalyticstoday.com/top-graph-databases
https://db-engines.com/en/ranking/graph+dbms

2
• We discuss in detail the design of selected graph databases.
• We outline related domains, such as queries and workloads in graph databases.
• We discuss future challenges in the design of graph databases.

1.1 Related Surveys


There exist several surveys dedicated to the theory of graph databases. In 2008, Angles et al. [10]
described the history of graph databases, and, in particular, the used data models, data structures,
query languages, and integrity constraints. In 2017, Angles et al. [8] analyzed in more detail query
languages for graph databases, taking both an edge-labeled and a property graph model into
account and studying queries such as graph pattern matching and navigational expressions. In 2018,
Angles and Gutierrez provided an overview [11] of basic notions in the graph database landscape.
While being related to our work, it is largely orthogonal, focusing on historical developments
(which we exclude), details of many graph database models and query languages (we only focus
on the ones routinely supported by existing graph database systems), and it only sketches at a
very high level a few selected graph database systems. Our work, instead, focuses primarily on
graph database systems and the details of their design, and analyzes in depth all other aspects
(graph data models, query languages, queries) through the perspective of being supported in these
systems. Also in 2018, Bonifati et al. [48] provided an in-depth investigation into querying graphs,
focusing on numerous aspects of query specification and execution. Moreover, there are surveys
that focus on NoSQL stores [70, 93, 106] and the Resource Description Framework (RDF) [173].
There is no survey dedicated to the systems aspects of graph databases, except for several brief
papers that cover small parts of the domain (brief descriptions of a few systems, concepts, or
techniques [125, 135, 136, 176, 180], a survey of graph processing ubiquity [194], and performance
evaluations of a few systems [133, 154, 220]).

2 GRAPH DATABASES AND OTHER CLASSES OF GRAPH SYSTEMS


Graph database systems are described in the literature as “systems specifically designed for man-
aging graph-like data following the basic principles of database systems, i.e., persistent data storage,
physical/logical data independence, data integrity, and consistency” [11]. However, other systems can
also store and process dynamic graphs. We now briefly discuss relations to three such classes: other
classes of databases, streaming graph frameworks, and general static graph processing systems.

2.1 Graph Databases vs. NoSQL Stores and Other Databases


NoSQL stores address various deficiencies of relational database systems, such as little support for
flexible data models [70]. Graph databases such as Neo4j can be seen as one particular type of NoSQL
stores; these systems are sometimes referred to as “native” graph databases [188]. Other types of
NoSQL systems include wide-column stores, document stores, and general key-value stores [70]. We
focus on both “native” graph databases such as Neo4j [188] and on other systems used specifically for
maintaining graphs (relational databases, object-oriented databases, NoSQL, and others). Figure 2
shows the types of considered systems.

2.2 Graph Databases vs. Graph Streaming Frameworks


In graph streaming [33], the input graph is passed as a stream of updates, allowing to add and
remove edges in a simple way. Graph databases are related to graph streaming in that they face
graph updates of various types. Still, they usually deal with complex graph models (such as the
Labeled Property Graph (LPG) [8] or Resource Description Framework [66]) where both vertices
and edges may be of different types and may be associated with arbitrary properties. Contrarily,

3
Types of database systems

Hierarchical Relational Object- NoSQL NewSQL


and network systems oriented systems systems
systems systems

No longer
actively
developed
RDF Native Wide-
RDF systems can systems graph column
be implemented stores stores
as NoSQL or as
traditional RDBM
systems

Document Tuple Key-value


The focus of this survey: any stores stores stores
systems used as graph databases.
We consider native graph stores
and parts of other domains related
to storing and processing graphs

Fig. 2. The illustration of the considered types of databases.

graph streaming frameworks focus on simple graph models where edges or vertices may have
weights and, in some cases, simple additional properties such as time stamps. Moreover, challenges
in the design of graph databases include transactional support, persistence, physical/logical data
independence, data integrity, or consistency; these topics are little related to graph streaming
frameworks.

2.3 Graph Databases vs. Static Graph Processing Systems


A lot of effort has been dedicated to static graph analytics [29, 75, 107, 155, 201, 226]. Many of
these works focus on the vertex-centric paradigms [4, 39, 126, 201]. Some works also focus on
edge-centric or linear algebra paradigms [130, 207, 211]. The key differences to graph databases
are that graph processing systems usually focus on graphs that are static and simple, i.e., do not
have rich attached data such as labels or key-value pairs (details in § 3.3). Moreover, static graph
processing systems do not focus on topics such as transactions, persistence, physical/logical data
independence, data integrity, or consistency.
Graph streaming frameworks and static graph processing systems are not covered in this work.

3 GRAPH DATA MODELS IN THE LANDSCAPE OF GRAPH DATABASES


We start with data models. This includes conceptual graph models and representations, and non-
graph models used in graph databases. Key symbols and abbreviations are shown in Table 1.

3.1 Simple Graph Model


We start with a simple graph model that is a basis for more complex and richer conceptual graph
models used in graph databases. A graph 𝐺 can be modeled as a tuple (𝑉 , 𝐸) where 𝑉 is a set of
vertices and 𝐸 ⊆ 𝑉 ×𝑉 is a set of edges. 𝐺 = (𝑉 , 𝐸) can also be denoted as 𝐺 (𝑉 , 𝐸). We have |𝑉 | = 𝑛
and |𝐸| = 𝑚. For a directed 𝐺, an edge 𝑒 = (𝑢, 𝑣) ∈ 𝐸 is a tuple of two vertices, where 𝑢 is the
out-vertex (also called “source”) and 𝑣 is the in-vertex (also called “destination”). If 𝐺 is undirected,
an edge 𝑒 = {𝑢, 𝑣 } ∈ 𝐸 is a set of two vertices. Finally, a weighted graph 𝐺 is modeled with a triple
(𝑉 , 𝐸, 𝑤); 𝑤 : 𝐸 → R maps edges to weights.

4
𝐺 A graph 𝐺 = (𝑉 , 𝐸) where 𝑉 is a set of vertices and 𝐸 is a set of edges.
𝑛, 𝑚 The count of vertices and edges in a graph 𝐺; |𝑉 | = 𝑛, |𝐸| = 𝑚.
𝑑, 𝑑ˆ The average degree and the maximum degree in a given graph, respectively.
P (𝑆) = 2𝑆 The power set of 𝑆: a set that contains all possible subsets of 𝑆.
AM, M The Adjacency Matrix representation. M ∈ {0, 1}𝑛,𝑛 , M𝑢,𝑣 = 1 ⇔ (𝑢, 𝑣) ∈ 𝐸.
AL, 𝐴𝑢 The Adjacency List representation and the adjacency list of a vertex 𝑢; 𝑣 ∈ 𝐴𝑢 ⇔ (𝑢, 𝑣) ∈ 𝐸.
LPG, RDF Labeled Property Graph (§ 3.3.2) and Resource Description Framework (§ 3.3.4).
KV, RDBMS Key-Value store (§ 6.4) and Relational Database Management Systems (§ 6.7).
OODBMS Object-Oriented Database Management Systems (§ 6.8).
OLTP, OLAP Online Transaction Processing (§ 4.1) and Online Analytics Processing (§ 4.1).
ACID Transaction guarantees (Atomicity, Consistency, Isolation, Durability).
Table 1. The most relevant symbols and abbreviations used in this work.

3.2 Fundamental Representations of Graph Structure


We also summarize two fundamental ways to represent the structure of connections between
vertices. Two common such graph representations of vertex neighborhoods are the adjacency
matrix format (AM) and the adjacency list format (AL). We illustrate these representations in
Figure 3.
In the AM format, a matrix M ∈ {0, 1}𝑛,𝑛 determines the connectivity of vertices: M𝑢,𝑣 = 1 ⇔
(𝑢, 𝑣) ∈ 𝐸. In the AL format, each vertex 𝑢 has an associated adjacency list 𝐴𝑢 . This adjacency list
maintains the IDs of all vertices adjacent to 𝑢. Each adjacency list is often stored as a contiguous
 We have 𝑣 ∈ 𝐴𝑢 ⇔ (𝑢, 𝑣) ∈ 𝐸.
array of vertex IDs.
AM uses O 𝑛 2 space and can check connectivity of two vertices   in O (1) time. AL requires
ˆ
O (𝑛 + 𝑚) space and it can check connectivity in O (|𝐴𝑢 |) ⊆ O 𝑑 time. The AL or AM represen-
tations are used to maintain the graph structure (i.e., neighborhoods of vertices).

3.3 Conceptual Graph Data Models Used in Graph Databases


We now introduce the conceptual graph models used by the surveyed systems; these models extend
the simple graph model from § 3.1. A simple graph model is often used in graph frameworks such
as Pregel [148] or STINGER [80]. However, it is not commonly used with graph databases.
3.3.1 Hypergraph Model. A hypergraph 𝐻 generalizes a simple graph: any of its edges can join
any number of vertices. Formally, a hypergraph is also modeled as a tuple (𝑉 , 𝐸) with 𝑉 being a set
of vertices. 𝐸 is defined as 𝐸 ⊆ (P (𝑉 ) \ ∅) and it contains hyperedges: non-empty subsets of 𝑉 .
Hypergraphs are rarely used in graph databases and graph processing systems. In this survey, we
describe a system called HyperGraphDB (§ 6.4.2) that focuses on storing and querying hypergraphs.
3.3.2 Labeled Property Graph Model. The classical graph model, a tuple 𝐺 = (𝑉 , 𝐸), is adequate
for many problems such as computing vertex centralities [49]. However, it is not rich enough to
model various real-world problems. This is why graph databases often use the Labeled Property
Graph Model, sometimes simply called a property graph [8, 48]. In LPG, one augments the simple
graph model (𝑉 , 𝐸) with labels that define different subsets (or classes) of vertices and edges. Fur-
thermore, every vertex and edge can have any number of properties [48] (often also called attributes).
A property is a pair (𝑘𝑒𝑦, 𝑣𝑎𝑙𝑢𝑒), where key identifies a property and value is the corresponding
value of this property [48]. Formally, an LPG is defined as a tuple (𝑉 , 𝐸, 𝐿, 𝑙𝑉 , 𝑙𝐸 , 𝐾,𝑊 , 𝑝𝑉 , 𝑝 𝐸 ) where
𝐿 is the set of labels. 𝑙𝑉 : 𝑉 ↦→ P (𝐿) and 𝑙𝐸 : 𝐸 ↦→ P (𝐿) are labeling functions. Note that P (𝐿) is
the power set of 𝐿, denoting all the possible subsets of 𝐿. Thus, each vertex and edge is mapped to a

5
INPUT GRAPH: Adjacency Matrix
An n x n matrix
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1
0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

...
0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1
0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0
0 1 1 0 0 1 0 0 1 1 0 0 0 1 0 1
0 0 1 1 0 1 1 0 0 1 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
n 0 0 0 0 0 1 1 1 1 0 1 1 1 1 1 0
n: number of vertices 1 ... n
m: number of edges Unweighted graph: a cell is one bit
d: maximum graph degree Weighted graph: a cell is one integer

Adjacency List Edge List (sorted & unsorted)


The number of
11 ∅ all elements in 11 2 11 1
3 11 1 7 16
2 11 the adjacency 2 6 11
One
tuple
data structure: 3 12
3 11 12 2m (undirected 3 16 15 corres-
Offset array
is optional

4 graph) and m 4 4 12 4 12 ponds


12 (directed graph) 5 10 to one
5 10 5 5 10 edge
Neighbor- 6 6 9 7 9
6 9 10111216 hoods can 6 10
7 9 10121416 be arbitrary 7 7 10
8 6 11 3 11
16
...

8 structures,
...

6 12
...

6 12
...

9 6 7 1116
e.g., arrays 9
or lists
10 6 16 7 12
10 5 6 7 1112 7 9 6 9
11 2 3 6 9 10 14 16 11 7 10 6 10
...

12 3 4 6 7 1016 12 7 12 6 16
13 15 16 13 7 14 7 14
Neighborhoods 14
14 7 1116 can be sorted 7 16 2 11
15 13 16 or unsorted 15 2m 2m
...
...

n 16 16 15 or or 3 12
n 16 6 7 8 9 11 12 13 14 15 m m
1 ... d Pointers from
Pointers from vertices vertices to their Number 2m (undirected),
to their neighborhoods neighborhoods of tuples: m (directed)

Fig. 3. Illustration of fundamental graph representations: Adjacency Matrix, Adjacency List, and Edge List.

subset of labels. Next, a vertex as well as an edge can be associated with any number of properties.
We model a property as a key-value pair 𝑝 = (𝑘𝑒𝑦, 𝑣𝑎𝑙𝑢𝑒), where 𝑘𝑒𝑦 ∈ 𝐾 and 𝑣𝑎𝑙𝑢𝑒 ∈ 𝑊 . 𝐾 and 𝑊
are sets of all possible keys and values. Finally, 𝑝𝑉 (𝑢) denotes the set of property key-value pairs
of the vertex 𝑢, 𝑝 𝐸 (𝑒) denotes the set of property key-value pairs of the edge 𝑒. An example LPG is
in Figure 4. Note that, in LPGs, 𝐸 may be a multi-set (i.e., there may be more than a single edge
between vertices, even having identical labels and/or key-value sets. All systems considered in this
work use some variant of the LPG, with the exception of RDF systems or when explicitly discussed.

6
:Person :knows :Person
name = Alice since = 09.08.2007 name = Bob
age = 21 age = 24

:hasCreator :hasCreator

:Message :Message
:Post :Comment
title = Holidays :replyOf
text = We had... text = Wow! ...

Fig. 4. The illustration of an example Labeled Property Graph (LPG). Vertices and edges can have labels (bold,
prefixed with colon) and properties (key = value). We present a subgraph of a social network, where a person can know
other persons, post messages, and comment on others’ messages.

3.3.3 Variants of Labeled Property Graph Model. Several databases support variants of LPG. First,
Neo4j [188] (a graph database described in detail in § 6.9.1) supports an arbitrary number of labels
for vertices. However it only allows for one label, (called edge-type), per edge. Next, ArangoDB [18]
(a graph database described in detail in § 6.5.2) only allows for one label per vertex (vertex-type) and
one label per edge (edge-type). This facilitates the separation of vertices and edges into different
document collections. Moreover, edge-labeled graphs [8] do not allow for any properties and use
labels in a restricted way. Specifically, only edges have labels and each edge has exactly one label.
Formally, 𝐺 = (𝑉 , 𝐸, 𝐿), where 𝑉 is the set of vertices and 𝐸 ⊆ 𝑉 × 𝐿 × 𝑉 is the set of edges. Note
that this definition enables two vertices to be connected by multiple edges with different labels.
Finally, some effort was dedicated to LPG variants that facilitate storing historical graph data [57].
3.3.4 Resource Description Framework. The Resource Description Framework [66] is a collection of
specifications for representing information. It was introduced by the World Wide Web Consortium
(W3C) in 1999 and the latest version (1.1) of the RDF specification was published in 2014. Its goal is
to enable a simple format that allows for easy data exchange between different formats of data.
It is especially useful as a description of irregularly connected data. The core part of the RDF
model is a collection of triples. Each triple consists of a subject, a predicate, and an object. Thus, RDF
databases are also often called triple stores (or triplestores). Subjects can either be identifiers (called
Uniform Resource Identifiers (URIs)) or blank nodes (which are dummy identifiers for internal
use). Objects can be URIs, blank nodes, or literals (which are simple values). With triples, one can
connect identifiers with identifiers or identifiers with literals. The connections are named with
another URI (the predicate). RDF triples can be formally described as
(𝑠, 𝑝, 𝑜) ∈ (𝑈 𝑅𝐼 ∪ 𝑏𝑙𝑎𝑛𝑘) × (𝑈 𝑅𝐼 ) × (𝑈 𝑅𝐼 ∪ 𝑏𝑙𝑎𝑛𝑘 ∪ 𝑙𝑖𝑡𝑒𝑟𝑎𝑙)
𝑠 represents a subject, 𝑝 models a predicate, and 𝑜 represents an object. 𝑈 𝑅𝐼 is a set of Uniform
Resource Identifiers; 𝑏𝑙𝑎𝑛𝑘 is a set of blank node identifiers, that substitute internally URIs to allow
for more complex data structures; 𝑙𝑖𝑡𝑒𝑟𝑎𝑙 is a set of literal values [113, 173].
3.3.5 Transformations between LPG and RDF. To represent a Labeled Property Graph in the RDF
model, LPG vertices are mapped to URIs (❶) and then RDF triples are used to link those vertices with
their LPG properties by representing a property key and a property value with, respectively, an RDF
predicate and an RDF object (❷). For example, for a vertex with an ID vertex-id and a corresponding
property with a key property-key and a value property-value, one creates an RDF triple (vertex-id,
property-key, property-value). Similarly, one can represent edges from the LPG graph model in the
RDF model by giving each edge the URI status (❸), and by linking edge properties with specific
edges analogously to vertices: (edge-id, property-key, property-value) (❹). Then, one has to use two

7
triples to connect each edge to any of its adjacent vertices (❺). Finally, LPG labels can also be
transformed into RDF triples in a way similar to that of properties [27], by creating RDF triples for
vertices (❻) and edges (❼) such that the predicate becomes a “label” URI and contains the string
name of this label. Figure 5 shows an example of transforming an LPG graph into RDF triples. More
details on transformations between LPG and RDF are provided by Hartig [111].

2
:Person Alice 09.08.2007 Bob
name = Alice 21 24
age = 21 4 name
age name since 3 age
:knows 5
since = 09.08.2007 V-ID from E-ID to V-ID
type
1 label type
5 edge
:Person type 7
name = Bob knows vertex
age = 24 vertex label label
LPG graph 6 Person RDF graph

Fig. 5. Comparison of an LPG and an RDF graph: a transformation from LPG to RDF. “V-ID”, “E-ID”, “age”, “name”,
“type”, “from”, “to”, “since” and “label” are RDF URIs. Numbers in black circles refer to transformation steps in § 3.3.5.

If all vertices and edges only have one label, one can omit the triples for labels and store the label
(e.g., “Person”) together with the vertex or the edge name (“V-ID” and “E-ID”) in the identifier. We
illustrate a corresponding example in Figure 6.

:Person
name = Alice Alice 09.08.2007 Bob
age = 21 21 24
name name
:knows age since age
since = 09.08.2007
Person/V-ID knows/E-ID to
Person/V-ID
from
:Person type type type
name = Bob
age = 24
vertex edge vertex
LPG graph RDF graph

Fig. 6. Comparison of an LPG and an RDF graph: a transformation from LPG to RDF, given vertices and edges have
only one label. “Person/V-ID”, “knows/E-ID”, “age”, “name”, “type”, “from”, “to” and “since” are RDF URIs.

Transforming RDF data into the LPG model is more complex, since RDF predicates, which would
normally be translated into edges, are URIs. Thus, while deriving an LPG graph from an RDF graph,
one must map edges to vertices and link such vertices, otherwise the resulting LPG graph may
be disconnected. There are several schemes for such an RDF to LPG transformation, for example
deriving an LPG graph which is bipartite, at the cost of an increased graph size [113]. Details and
examples are provided in a report by Hayes [113].

3.4 Non-Graph Data Models and Storage Schemes Used in Graph Databases
In addition to the conceptual graph models, graph databases also often incorporate different storage
schemes and data models that do not target specifically graphs but are used in various systems
to model and store graphs. These models include collections of key-value pairs, documents, and
tuples (used in different types of NoSQL stores), relations and tables (used in traditional relational
databases), and objects (used in object-oriented databases). Different details of these models and the

8
database systems based on them are described in other surveys, for example in a recent publication
on NoSQL stores by Davoudian et al. [70]. Thus, we omit extensive discussions and instead offer
brief summaries, focusing on how they are used to model or represent graphs.
3.4.1 Collection of Key-Value Pairs. Key-value stores are the simplest NoSQL stores [70]. Here,
the data is stored as a collection of key-value pairs, with the focus on high-performance and highly-
scalable lookups based on keys. The exact form of both keys and values depends on a specific
system or an application. Keys can be simple (e.g., an URI or a hash) or structured. Values are often
encoded as byte arrays (i.e., the structure of values is usually schema-less). However, a key-value
store can also impose some additional data layout, structuring the schema-less values [70].
Due to the general nature of key-value stores, there can be many ways of representing a graph as
a collection of KV values. We describe several concrete example systems [72, 120, 186, 199] in § 6.4.
For example, one can use vertex labels as keys and encode the neighborhoods of vertices as values.
3.4.2 Collection of Documents. A document is a fundamental storage unit in a class of NoSQL
databases called document stores [70]. These documents are stored in collections. Multiple col-
lections of documents constitute a database. A document is encoded using a selected standard
semi-structured format, e.g., JSON [50] or XML [51]. Document stores extend key-value stores in
that a document can be seen as a value that has a certain flexible schema. This schema consists of
attributes, where each attribute has a name along with one or more values. Such a structure based
on documents with attributes allows for various value types, key-value pair storage, and recursive
data storage (attribute values can be lists or key-value dictionaries).
In all surveyed document stores [18, 53, 88, 141, 161] (§ 6.5), each vertex is stored in a vertex
document. The capability of documents to store key-value pairs is used to store vertex labels and
properties within the corresponding vertex document. The details of edge storage, however, are
system-dependent: edges can be stored in the document corresponding to the source vertex of each
edge, or in the documents of the destination vertices. As documents do not impose any restriction
on what key-value pairs can be stored, vertices and edges may have different sets of properties.
3.4.3 Collection of Tuples. Tuples are a basis of NoSQL stores called tuple stores. A tuple store
generalizes an RDF store: RDF stores are restricted to triples (or – in some cases – 4-tuples, also
referred to as quads) whereas tuple stores can contain tuples of an arbitrary size. Thus, the number
of elements in a tuple is not fixed and can vary, even within a single database. Each tuple has an ID
which may also be a direct memory pointer.
A collection of tuples can model a graph in different ways. For example, one tuple of size 𝑛 can
store pointers to other tuples that contain neighborhoods of vertices. The exact mapping between
such tuples and graph data is specific to different databases; we describe an example [224] in § 6.3.
3.4.4 Collection of Tables. Tables are the basis of Relational Database Management Systems
(RDBMS) [22, 64, 114]. Tables consist of rows and columns. Each row represents a single data
element, for example a car. A single column usually defines a certain data attribute, for example the
color of a car. Some columns can define unique IDs of data elements, called primary keys. Primary
keys can be used to implement relations between data elements. A one-to-one or a one-to-many
relation can be implemented with a single additional column that contains the copy of a primary
key of the related data element (such primary key copy is called the foreign key). A many-to-many
relation can be implemented with a dedicated table containing foreign keys of related data elements.
To model a graph as a collection of tables, one can implement vertices and edges as rows in two
separate tables. Each vertex has a unique primary key that constitutes its ID. Edges can relate to
their source or destination vertices by referring to their primary keys (as foreign keys). LPG labels

9
and properties, as well as RDF predicates, can be modeled with additional columns [225, 229]. We
present and analyze different graph database systems [23, 171] based on tables in § 6.6 and § 6.7.
3.4.5 Collection of Objects. One can also use collections of objects in Object-Oriented Database
Management Systems (OODBMS) [21] to model graphs. Here, data elements and their relations
are implemented as objects linked with some form of pointers. The details of modeling graphs as
objects heavily depend on specific designs. We provide details for an example system [223] in § 6.8.

4 QUERIES & WORKLOADS IN THE LANDSCAPE OF GRAPH DATABASES


We describe graph database workloads.

4.1 OLAP and OLTP


First, one distinguishes between Online Transactional Processing (OLTP) and Online Analytical
Processing (OLAP). OLTP queries are small, interactive, transactional, and local in scope (i.e., they
process only a small part of the graph). Examples are neighborhood queries, lookups, inserts,
deletes, and updates of single (or a few) vertices and edges. OLAP queries are usually not processed
at interactive speeds, as they are inherently complex and global in scope (i.e., they span the whole
graphs). Examples are PageRank [175] or Breadth-First Search (BFS).
Now, static graph processing systems (§ 2.3) focus on OLAP. However, many graph databases also
support a rich set of OLAP workloads. This includes Neo4j [188], Cray Graph Engine [187], Amazon
Neptune [5], TigerGraph [218], and many others (see Tables 2-3). For example, Neo4j provides
algorithms for vertex centrality (e.g., PageRank, Betweenness Centrality, Eigenvector Centrality),
community detection (e.g., Louvain, Triangle Counting, Weakly Connected Components, Label
Propagation), graph traversals (BFS, DFS), shortest paths (e.g., Delta Stepping, A∗ ), and many others.
Thus, we focus on both OLTP and OLAP, in the context of how they are supported by graph databases.

4.2 Graph Queries Beyond OLAP vs. OLTP


We also offer an analysis of graph queries beyond the simple distinction into OLAP and OLTP
classes. Figure 7 illustrates the queries in the context of accessing the LPG graph. We omit detailed
discussions and examples as they are provided in different associated papers (query languages [7, 8],
analytics workloads [15], benchmarks related to certain aspects [62, 145, 146] and whole systems [9,
20, 26, 56, 85, 121, 124, 212] and surveys on system performance [76, 154]). Instead, our goal is to
deliver a broad overview, and point the reader to the detailed material available elsewhere.
4.2.1 Scopes of Graph Queries. We describe queries in the increasing order of their scope. We
focus on the LPG model, see § 3.3.2. Figure 7 depicts the scope of graph queries.
Local Queries Local queries involve single vertices or edges. For example, given a vertex or an
edge ID, one may want to retrieve the labels and properties of this vertex or edge. Other examples
include verifying whether a given vertex or a given edge have a given label (given the label name),
or whether they have a property of a specified type. These queries are used in social network
workloads [20, 26] (e.g., to fetch the profile information of a user) and in benchmarks [124] (e.g., to
measure the vertex look-up time).
Neighborhood Queries Neighborhood queries retrieve all edges adjacent to a given vertex, or
the vertices adjacent to a given edge. This query can be further restricted by, for example, retrieving
only the edges with a specific label. Similarly to local queries, social networks often require a
retrieval of the friends of a given person, which results in querying the local neighborhood [20, 26].
Traversals In a traversal query, one explores a part of the graph beyond a single neighborhood.
These queries usually start at a single vertex (or a small set of vertices) and traverse some graph

10
OLTP OLAP
Single vertices and edges Subgraphs, Paths, Patterns Whole graph
Property E
P P E V V E
P P E V V
P P P V V
E E
V
E
V E E V E E
V E V
Vertex Edge Label V E V
L L L E
L V V V

Local queries Neighborhood queries Traversals Global graph analytics


Scope / Complexity
Interactive workloads Business Intelligence workloads Graph analytics workloads
LDBC workloads

Fig. 7. Illustration of different query scopes and their relation to other graph query taxonomy aspects, in the context of
accessing a Labeled Property Graph.

part. We call the initial vertex or the set of vertices the anchor or root of the traversal. Queries can
restrict what edges or vertices can be retrieved or traversed. As this is a common graph database
task, this query is also used in different performance benchmarks [62, 76, 124].
Global Graph Analytics Finally, we identify global graph analytics queries, which by definition
consider the whole graph (not necessarily every property but all vertices and edges). Different
benchmarks [30, 56, 76, 154] take these large-scale queries into account since they are used in
different fields such as threat detection [79] or computational chemistry [25]. As indicated in
Tables 2 and 3, many graph databases support such queries. Graph processing systems such as
Pregel [148] or Giraph [15] focus specifically on resolving global analytics [107]. Example queries
include resolving global pattern matching [60, 202], shortest paths [74], max-flow or min-cut [67],
minimum spanning trees [134], diameter, eccentricity, connected components, PageRank [175],
and many others. Some traversals can also be global (e.g., finding all shortest paths of unrestricted
length), thus falling into the category of global analytics queries. Global analytics workloads have
been a subject of numerous research efforts in the last decade [29, 37, 38, 126, 155, 178, 204]

4.2.2 Classes of Graph Workloads. We also outline an existing taxonomy of graph database
workloads that is provided as a part of the LDBC benchmarks [9]. LDBC is an effort by academia
and industry to establish a set of standard benchmarks for measuring the performance of graph
databases. The effort currently specifies interactive workloads, Business Intelligence workloads, and
graph analytics workloads.
Interactive Workloads A part of LDBC called the Social Network Benchmark (SNB) [85]
identifies and analyzes interactive workloads that can collectively be described as either read-only
queries or simple transactional updates. They are divided into three further categories. First, short
read-only queries start with a single graph element (e.g., a vertex) and lookup its neighbors or
conduct small traversals. Second, complex read-only queries traverse larger parts of the graph; they
are used in the LDBC benchmark to not just assess the efficiency of the data retrieval process but
also the quality of query optimizers. Finally, transactional update queries insert, modify, remove
either a single element (e.g., a vertex), possibly together with its adjacent edges, or a single edge.
This workload tests common graph database operations such as the lookup of a friend profile in a
social network, or friendship removal.

11
Business Intelligence Workloads Next, LDBC identifies Business Intelligence (BI) workloads [212],
which fetch large data volumes, spanning large parts of a graph. Contrarily to the interactive work-
loads, the BI workloads heavily use summarization and aggregation operations such as sorting,
counting, or deriving minimum, maximum, and average values. They are read-only. The LDBC spec-
ification provides an extensive list of BI workloads that were selected so that different performance
aspects of a database are properly stressed when benchmarking.
Graph Analytics Workloads Finally, the LDBC effort comes with a graph analytics bench-
mark [121], where six graph algorithms are proposed as a standard benchmark for a graph analytics
part of a graph database. These algorithms are Breadth-First Search, PageRank, weakly connected
components [99], community detection using label propagation [47], deriving the local clustering
coefficient [197], and computing single-source shortest paths [74].
LDBC Workloads The LDBC interactive workloads correspond to local, neighborhood, and
traversals. The LDBC Business Intelligence workloads range from traversals to global graph analytics
queries. The LDBC graph analytics benchmark corresponds to global graph analytics.
4.2.3 Graph Patterns and Navigational Expressions. Angles et al. [8] inspected in detail the theory
of graph queries. In one identified family of graph queries, called simple graph pattern matching,
one prescribes a graph pattern (e.g., a specification of a class of subgraphs) that is then matched to
the graph maintained by the database, searching for the occurrences of this pattern. This query
can be extended with aggregation and a projection function to so called complex graph pattern
matching. Next, path queries allow to search for paths of arbitrary distances in the graph. One can
also combine complex graph pattern matching and path queries, resulting in navigational graph
pattern matching, in which a graph pattern can be applied recursively on the parts of the path.
4.2.4 Input Loading. Finally, certain benchmarks also analyze bulk input loading [62, 76, 124].
Specifically, given an input dataset, they measure the time to load this dataset into a database. This
scenario is common when data is migrated between systems.

5 TAXONOMY OF GRAPH DATABASE SYSTEMS


We now describe how we categorize graph database systems considered in this survey [5, 16, 18, 19,
23, 44, 53, 55, 68, 72, 88, 92, 101, 120, 127, 141, 144, 150, 151, 161, 164, 169, 170, 181, 186–188, 199, 223–
225, 227, 228]. This taxonomy incorporates existing concepts related to graph data models (cf. § 3)
and to graph queries (cf. § 4). Then, other aspects of the proposed taxonomy are novel. In this
section, we describe the taxonomy in a general way. In § 7, we analyze the taxonomy in the context
of specific graph database systems. The main dimensions of the taxonomy are (1) the general
backend type, (2) data organization, and (3) query execution. Figure 8 illustrates the general types
of considered databases together with certain aspects of data models and organization. Figure 9
summarizes all elements of the proposed taxonomy.

5.1 Types of Graph Database Storage Backends


We first identify general types of graph databases that primarily differ in their storage backends
(e.g., a triple store or a document store). This facilitates further taxonomization and analysis because
(1) the backend design has a profound impact on almost all other aspects of a graph database such
as data organization, and because (2) it straightforwardly enables categorizing all considered graph
databases into a few clearly defined groups.
Some classes of systems use a certain specific backend technology, adapting this backend to
storing graph data, and adding a frontend to query the graph data. Examples of such systems are
tuple stores, document stores, key-value stores, wide-column stores, Relational Database

12
Column RDBMS Row RDBMS
Vertices and edges are stored in Vertices and edges are stored in
rows of two column-oriented tables rows of two row-oriented tables
Object-Oriented DBMS Table with Table with Table with Table with
vertices edges vertices edges
Vertices and edges are stored
in Java, C#, ... language objects

Different Different Different Different


Details of data records records records records
organization are
system-dependant Records forming a column Dashed Records forming a row
are stored contiguously regions are stored contiguously
are conti- Dashed
Examples: in memory or on disk guous in in memory or on disk regions
Objectivity el
Modd: memory are conti-
One column can implement One column can implement guous in
ThingSpan, use ts a property, a label, or an a property, a label, or an memory
Velocity Graph objec ID (primary or foreign key) ID (primary or foreign key)
ed:
el us el us
ed:
Example: Mod s (imple- Example: Oracle
Mod s (imple-
SAP HANA b le Spatial and Graph
ta ting table enting
men ons) m
relati ons)
relati
OO

Data Hub

RDBMS
Native (Triple) Graph Store
DB

Combines multiple models and/or storage schemes


(based on the RDF model)
M

+
S

Subject URIs are linked to


+ object URIs via predicates
+ ... el
Mod d:
Examples: Cayley, InfoGrid, MarkLogic, use ral
OpenLink Virtuoso, Stardog severent (Subj, Pred, Obj)
L diffe es
SQes
(Subj, Pred, Obj)
on
o URI p URI
N o r
st Wide-Column Store
p
Triples can
A vertex is stored in a row and form records
it is indexed by an unique ID; its
Key-Value Store properties, labels, and adjacent URI p URI
Vertices and edges are encoded in edges are stored in row cells
values and indexed by keys (IDs) Key Row (vertex) (Subj, Pred, Obj)
Keys Values One value ID prop prop edge edge edge
often forms Examples:
el
ID vertex/edge one record
ID prop prop prop edge AllegroGraph, Modd:
Cray Graph use s
Engine triple
ID vertex/edge ID prop edge edge
el
Modd:
ID vertex/edge One cell contains use lue
a key-value pair -v a
key and Native Graph Store
An opaque value contains properties, pairs les (based on the LPG model)
labels, adjacent vertices and edges
Examples: Titan, tab
JanusGraph, DSE Graph,
ed: Custom database systems,
Examples: Dgraph, el us optimized for graph storage
Modairs of
HyperGraphDB, p a n d and traversal queries.
MS Graph Engine s
key es
valu Document Store
Vertices and edges are encoded
in documents (e.g. JSON) and
linked via pointers or document IDs
Tuple Store
Document
Vertices and edges are stored in Document (JSON / XML)
tuples, linked via pointers or IDs ID
of other tuples vertex/edge
ID
attr attr attr
Details of data organization are
vertex/edge system-dependant. Adjacency
vertex / information is explicitly maintained
ID edge attr to accelerate graph traversals.
vertex/edge
Examples:
vertex/edge vertex/edge Sparksee/DEX, ed:
ID el us
Pointers attr attr attr attr TigerGraph, Mod beled
La
Tuples or IDs GraphBase, e y
rt
contain Prop ph
Memgraph, Gra
properties Division into
records depends Attributes implement A document Neo4j, PGX
and labels properties and labels often forms
on a system one record
el Examples: OrientDB,
Examples: Modd: el
use s ArangoDB, Azure Modd:
WhiteDB, Graphd tu p le Cosmos DB, FaunaDB use ents
cu m
do

Fig. 8. Overview of different graph storage backends, with examples.


13
Taxonomy of Graph Databases

System Backends
What is the general • Native (triple) store • Native (LPG) store • Tuple store • Document store
type of a database • Key-value store • Wide-column store • RDBMS • OODBMS
storage backend?

Data Organization
Representations Graph Models
Optimizations What representations • AM What conceptual • RDF triples
of graphs are used? • AL models of graph • LPG
What common opti- data are supported?
mizations are used?

Edge Records Types of Records Linking Records Lightweight Edges


How are edges stored? What types of records How are records Is there support for
are supported? linked together? lightweight edges?

• Within vertex records • Fixed sized • With direct pointers • Yes • No


• Within edge records • Variable sized • With IDs or references

Data Distribution
Can the system run in Data Sharding Data Replication
a distributed mode? • Yes
Is data sharding • Yes
• No Is data replication • No
• Yes • No supported? supported?

Indexes
Design Purpose • Data indexes (internal)
• Tree • Data indexes (external)
How are indexes • Hashtable What are • Neighborhood indexes
implemented? • Skip list indexes • Structural indexes
used for?

Query Execution

Queries Concurrency Parallelization ACID Languages • SPARQL


What queries Can multiple Can a single query • Gremlin
Is ACID What query • Cypher
are supported? queries be run be parallelized? supported? languages
concurrently? • SQL
are supported? • GraphQL
• OLAP • OLTP • Yes • No • Yes (fully) • No • Other
• OLAP & OLTP • Yes • No • Yes (partially)

Fig. 9. Overview of the identified taxonomy of graph databases.

Management Systems (RDBMS), or Object-Oriented Database Management Systems (OODBMS).


Other graph databases are designed specifically for maintaining and querying graphs; we call such
systems native graph databases (or native graph stores), they are based on either the LPG or
the RDF graph data model. Finally, we consider designs called the data hubs; they enable using
many different storage backends, facilitating storing data in different formats and models.
Some of the above categories of systems fall into the domain of NoSQL stores. For example, this
includes document stores, key-value stores, or some triple stores. However, there is no strict assign-
ment of specific storage backends as NoSQL. For example, triple stores can also be implemented as,
e.g., RDBMS [70]. Figure 8 illustrates these systems, they are discussed in more detail in § 6.

14
5.2 Data Organization
In data organization, we distinguish the following taxonomy aspects: (1) graph structure represen-
tation, (2) conceptual data models, (3) indexes, (4) data distribution, and (5) common optimizations.
First, in graph structure representation, we analyze whether the graph structure is stored
using the AL or the AM representation (see § 3.2). The graph structure representation directly
impacts the performance of queries.
Second, we investigate what conceptual data models are supported by different graph databases.
Here, we focus on the RDF and LPG models as well as their variants, described in § 3.3. The used
graph model strongly influences what graph query languages can be used together with a given
system, and it also has impact on the associated data layout.
We also analyze how graph databases use indexes to accelerate accessing data. Indexes can
significantly improve the performance of GDBs, and they are widely used, for example in RDF
systems [2, 131, 182]. Here, we consider the functionality (i.e., the use case) of a given index, and
how a given index is implemented 2 . As for the former, we identify four different index use cases:
storing the locations of vertex neighborhoods (referred to as “neighborhood indexes”), indexing
graph elements, such as vertices, that satisfy pre-specified conditions related to rich graph data
(referred to as “data indexes”), storing the actual graph data, and maintaining non-graph related
data (referred to as “structural indexes”). As for the latter, we identify three fundamental data
structures used to implemented indexes: trees, skip lists, and hashtables.
We also identify whether a database can run in a distributed mode. A system is distributed or
multi-server if it can run on multiple servers (also called compute nodes) connected with a network.
In such systems, data may be replicated [94] (maintaining copies of the dataset at each server), or it
may allow for sharding [84] (data fragmentation, i.e., storing only a part of the given dataset on
one server). Replication often allows for more fault tolerance [83], sharding reduces the amount of
used memory per node and can improve performance [83]. Gathering this information facilitates
selecting a system with the most appropriate performance properties in a given context. For
example, systems that replicate but not shard the data, may offer more performance for read-only
workloads, but may scale badly for particularly large graphs that would require disk spilling.
Finally, we identify common optimizations: (1) dividing data into records, (2) lightweight
edges, and (3) linking records with direct pointers.
Dividing Data into Records Graph databases usually organize data into small units called
records. One record contains information about a certain single entity (e.g., a person), this informa-
tion is organized into specified logical fields (a name, a surname, etc.). A certain number of records
is often kept together in one contiguous block in memory or disk to enhance data access locality.
Enabling Lightweight Edges Some systems (e.g., OrientDB) allow edges without labels or
properties to be stored as lightweight edges. Such edges are stored in the records of the corresponding
source and/or destination vertices. These lightweight edges are represented by the ID of their
destination vertex, or by a pointer to this vertex. This can save storage space and accelerate resolving
different graph queries such as verifying connectivity of two vertices [54].
Linking Records with Direct Pointers In record based systems, vertices and edges are stored
in records. To enable efficient resolution of connectivity queries (i.e., verifying whether two vertices
are connected), these records have to point to other records. One option is to store direct pointers
(i.e., memory addresses) to the respective connected records. For example, an edge record can store
direct pointers to vertex records with adjacent vertices. Another option is to assign each record
a unique ID and use these IDs instead of direct pointers to refer to other records. On one hand,
2We do not include the index information in Tables 2–3 because of lack of space, and instead provide a detailed separate
analysis in § 7.2.7.

15
this requires an additional indexing structure to find the physical location of a record based on its
ID. On the other hand, if the physical location changes, it is usually easier to update the indexing
structure instead of changing all associated direct pointers.
Note that specific systems can employ other diverse optimizations. For example, in addition to
using index structures to maintain the locations of data, some databases also store the graph data
in the indexes themselves. In such cases, the index does not point to a certain data record but the
index itself contains the desired data. Example systems with such functionality are Sparksee/DEX
and Cray Graph Engine. To maintain indices, the former uses bitmaps and B+ trees while the latter
uses hashtables.

5.3 Query Execution


In query execution, we identify the following aspects: (1) concurrent execution of different queries,
(2) parallelization of single queries, (3) ACID transactions, (4) support for classes of queries, and (5)
support for query languages.
Note that almost all of the studied graph databases are closed source or do not come with
any associated discussions of the details of the design of query execution (except for general
descriptions). Thus, we do not offer a detailed associated taxonomy for algorithmic aspects of query
execution, beyond the above criteria. However, we provide a detailed associated discussion on a few
systems that do come with more details on their query execution. We also analyze relationships
between the backend type, the data organization, and the query execution. This enables deriving
certain insights about the design of different backends. For example, the query language support is
primarily affected by the supported conceptual graph model; if it is RDF, then the system usually
supports SPARQL while systems focusing on LPG usually support Cypher or Gremlin.
We define concurrent execution as the execution of separate queries at the same time. Concur-
rent execution of queries can lead to higher throughput. We also define parallel execution as the
parallelized execution of a single query, possibly on more than one server or compute node. Parallel
execution can lead to lower latencies for queries that can be parallelized. In § 7.3, we correlate
the support for concurrent and parallel queries with different fundamental backend types, and we
describe the details of query execution in graph databases that disclose this information.
Many graph databases support transactions; we analyze them in § 7.3.2. ACID [115] (Atomicity,
Consistency, Isolation, Durability) is a well-known set of properties that database transactions
uphold in many database systems. Different graph databases explicitly ensure some or all of ACID.
We also analyze supported queries in § 7.3.3. Some databases (e.g., ArangoDB [18]) are oriented
towards OLTP, where focus is on executing many smaller, interactive, transactional queries. Other
systems (e.g., Cray Graph Engine [187]) focus more on OLAP: they execute analytics queries that
span the whole graphs, usually taking more time than OLTP. Finally, different databases (e.g.,
Neo4j [188]) offer extensive support for both.
Although we do not focus on graph database languages, we also report on supported query
languages in § 7.3.4). We consider the leading languages such as SPARQL [177], Gremlin [189],
Cypher [91, 103, 116], and SQL [69]. We also mention other system-specific languages such as
GraphQL [112] and support for APIs from languages such as C++ or Java3 . Note that mapping
graph queries to SQL was also addressed in past work [209].

6 ANALYSIS OF DATABASE SYSTEMS


We survey and describe selected graph database systems with respect to the proposed taxonomy. In
each system category, we describe selected representative systems, focusing on the associated graph
3We bring to the reader’s attention a manifesto on creating GQL, a standardized graph query language (https://gql.today).

16
Graph Database Model Repr. Data Organization Data Distribution & Query Execution
oB Additional remarks
System
lpg rdf al am fs vs dp se sv lw ms rp sh ce pe tr oltp olap
NATIVE GRAPH DATABASES (RDF model based, triple stores) (§ 6.2). The main data model used: RDF triples (§ 3.3.4).
AllegroGraph [92] é é é é ∗é é é é é é ? ∗ Triples are stored as integers (RDF strings
map to integers).
BlazeGraph [44] é ∗ ∗é é ? ? é é é é ? ? ? ? ∗ BlazeGraph uses RDF*, an extension of RDF
(details in § 6.2.2).
Cray Graph Engine [187] é é é é é∗ é∗ é é é é é é é é ∗ RDF triples are stored in hashtables.
Amazon Neptune [5] é é é ? ? é é é é é é —
AnzoGraph [55] é é é ? ? é é é é é —
Apache Jena TBD [17] é é ? é ? ? ? ? ? ? é ? é é ? —
Apache Marmotta [16] é é é é ∗é é é é é ? ? ? ∗ The structure of data records is based on
that of different RDBMS systems
(H2 [163], PostgreSQL [162], MySQL [78]).
BrightstarDB [164] é é é é ? ? é é é é ? ? ? ? ? —
gStore [231] é é é é é é é ? ? é ? é ? ? ? —
Ontotext GraphDB [169] é é é é ? ? é é é é é ? ? —
Profium Sense [181] é é ∗é é ? ? é é é é ? ? ? ∗ The format used is called JSON-LD:
JSON for vertices and RDF for edges.
TripleBit [228] é é é é é ∗ é é é é é‡ é é é é ? ? The data organization uses compression.
∗ Strings map to variable size integers.
‡ Described as future work.

NATIVE GRAPH DATABASES (LPG model based) (§ 6.9). The main data model used: LPG (§ 3.3.2, § 3.3.3).
Neo4j [188] é é é é é é é é Neo4j is provided as a cloud service by a
system called Graph Story [102].
Sparksee/DEX [151] é é é∗ é é‡ é‡ é é é é é ∗ Bitmaps are used for connectivity.
‡ The system uses maps only.
GBase [127] é é∗ é é‡ é é é é é é ? ? ? ? ? é ∗ GBase supports simple graphs only (§ 3.1).
‡ GBase stores the AM sparsely.
GraphBase [86] é é∗ é ? é é ? ? ? ? ? ? ? ∗ No support for edge properties, only two
types of edges available.
Graphflow [128] é é é ? ? ? ? ? ? é ? ? ? ? ? ? —
LiveGraph [230] é é é é é é é é ? é ? —
Memgraph [156] é é é ? ? ? ? ? ? ∗ ‡ ∗ This feature is under development.
‡ Available only for some algorithms.
TigerGraph [218] é é ? é ? ? ? ? ? ? —
Weaver [77] é é ? é ? ? ? ? ? ? —
KEY-VALUE STORES (§ 6.4). The main data model used: key-value pairs (§ 3.4.1).

HyperGraphDB [120] é é∗ é é‡ é é é é é † ∗ A Hypergraph model. ‡ The system uses


an incidence index to retrieve edges of a
vertex. † Support for ACI only.
MS Graph Engine [199] é ∗ é é ‡é é é é ∗ AL contains IDs of edges and/or vertices.
‡ Schema is defined by Trinity
Specification Language (TSL).
Dgraph [72] é é é é é é é Dgraph is based on Badger [71].
RedisGraph [183, 186, 203] é é é é é é é é é é é é é ∗ RedisGraph is based on Redis [185].
∗ The OLAP part uses GraphBLAS [130].

DOCUMENT STORES (§ 6.5). The main data model used: documents (§ 3.4.2).
ArangoDB [18] é é∗ é é é é é é ∗ Uses a hybrid index for retrieving edges.
OrientDB [53] é ∗ é é é ‡ é ∗ AL contains RIDs (i.e., physical locations)
of edge and vertex records. ‡ Sharding is
user defined. OrientDB supports JSON and
it offers certain object-oriented capabilities.
Azure Cosmos DB [161] é é é é é é é é ? —
Bitsy [141] é é é é é é é é é é é é é The system is disk based and uses JSON files.
The storage only allows for appending data.
FaunaDB [88] ∗ é ‡é é é é é é ∗ Document, RDBMS, graph, “time series”.
‡ Adjacency lists are separately precomputed.

Table 2. Comparison of graph databases (native graph databases based on RDF and LPG, key-value stores, and
document stores). Bolded systems are described in more detail in the corresponding sections. oB: A system supports
secondary data models / backend types (in addition to its primary one). lpg, rdf: A system supports, respectively, the
Labeled Property Graph and RDF without prior data transformation. am, al: The structure is represented as the
adjacency matrix or the adjacency list. fs, vs: Data records are fixed size and variable size, respectively. dp: A system
can use direct pointers to link records. This enables storing and traversing adjacency data without maintaining indices.
se: Edges can be stored in a separate edge record. sv: Edges can be stored in a vertex record. lw: Edges can be
lightweight (containing just a vertex ID or a pointer, both stored in a vertex record). ms: A system can operate in a
Multi-Server (distributed) mode. rp: Given a distributed mode, a system enables Replication of datasets. sh: Given a
distributed mode, a system enables Sharding of datasets. ce: Given a distributed mode, a system enables Concurrent
Execution of multiple queries. pe: Given a distributed mode, a system enables Parallel Execution of single queries on
multiple nodes/CPUs. tr: Support for ACID Transactions. oltp: Support for Online Transaction Processing. olap:
Support for Online Analytical Processing. : A system offers a given feature. : A system offers a given feature in a
limited way. é: A system does not offer a given feature. ?: Unknown.
17
Graph Database oB Model Repr. Data Organization Data Distribution & Query Execution
Additional remarks
System lpg rdf al am fs vs dp se sv lw ms rp sh ce pe tr oltp olap
RELATIONAL DBMS (RDBMS) (§ 6.7). The main data model used: tables (implementing relations) (§ 3.4.4).
Oracle Spatial ∗ é é ?∗ ?∗ é ∗ é é ∗ LPG and RDF use row-oriented storage.
and Graph [171] The system can also run on top of PGX [117]
(effectively as a native graph database).
AgensGraph [43] é é é ? ? é ? é é AgensGraph is based on PostgreSQL.
FlockDB [219] é é é é é ? ? é ? é é é é é The system focuses on “shallow” graph
queries, such as finding mutual friends.
IBM Db2 é é é é ? ? é ∗ ∗ ‡ ‡ ‡ ‡ ‡ ‡ ? ∗ can store vertices/edges in the same table.
Graph [217] ‡ inherited from the underlying IBM Db2™.
MS SQL Server é é é ? ? é ? é é The system uses an SQL graph extension.
2017 [160]
OQGRAPH [149] é é é é ?∗ ?∗ é ∗ é é é ? OQGRAPH uses MariaDB [28].
∗ OQGRAPH uses row-oriented storage.
SAP HANA [195] é é é é∗ é∗ é é∗ é é ∗ SAP HANA is column-oriented, edges and
vertices are stored in rows. SAP HANA can
be used with a dedicated graph engine [193];
it offers some capabilities of a JSON document
store [195]
SQLGraph [210] é é é é ? ∗é ‡ é † † † † † † ? ∗ SQLGraph uses JSON for property storage.
‡ SQLGraph uses row-oriented storage.
† depends on the used SQL engine.

WIDE-COLUMN STORES (§ 6.6). The main data model used: key-value pairs and tables (§ 3.4.1, § 3.4.4).
JanusGraph [23] é é é é é é é JanusGraph is the continuation of Titan.
Titan [23] é é é é é é Enables various backends (e.g.,
Cassandra [140]).
DSE Graph é é é é é é é ? ∗ DSE Graph is based on Cassandra [140].
(DataStax) [68] ∗ Support for AID, Consistency is configurable.
HGraphDB [227] é é é é é é é é ? ? é∗ HGraphDB uses TinkerPop3 with HBase [97].
∗ ACID is supported only within a row.

TUPLE STORES (§ 6.3). The main data model used: tuples (§ 3.4.3).

WhiteDB [224] é é ∗ ‡é é ∗ ‡ ‡ ‡é é é † ? ∗ Implicit support for triples of integers.


‡ Implementable by the user. † Transactions
use a global shared/exclusive lock.
Graphd [101] é é ∗é é ? ? ? é é ? ? ‡? ? Backend of Google Freebase.
∗ Implicit support for triples. ‡ Subset of ACID.

OBJECT-ORIENTED DATABASES (OODBMS) (§ 6.8). The main data model used: objects (§ 3.4.5).
Velocity- é é é é é The system is based on VelocityDB [222]
Graph [223]
Objectivity é é ? ? ? ? ? ? ? The system is based on ObjectivityDB [105].
ThingSpan [167]
DATA HUBS (§ 6.10). The main data model used: several different ones.
MarkLogic [150] é∗ é é ? é ∗ é é ? Supported storage/models: relational tables,
RDF, various documents. ∗ Vertices are stored
as documents, edges are stored as RDF triples.
OpenLink é é é ? ? é é é é ∗ Supported storage/models: relational tables
Virtuoso [170] and RDF triples. ∗ This feature can be used
relational data only.
Cayley [58] ? ? ? ? ? ? ? é ? ∗ ? ? Supported storage/models: relational tables,
RDF, document, key-value. ∗ This feature
depends on the backend.
InfoGrid [119] é ? ? ? ? é é ? ∗ ? ? Supported storage/models: relational tables,
Hadoop’s filesystem, grid storage. ∗ A weaker
consistency model is used instead of ACID.
Stardog [208] ∗ ∗é é ? é ∗ é é é ? Supported storage/models: relational tables,
documents. ∗ RDF is simulated on relational
tables. Both LPG and RDF are enabled
through virtual quints.

Table 3. Comparison of graph databases (RDBMS, wide-column stores, tuple stores, OODBMS, data hubs).
Bolded systems are described in more detail in the corresponding sections. oB: A system supports secondary data models /
backend types (in addition to its primary one). lpg, rdf: A system supports, respectively, the Labeled Property Graph and
RDF without prior data transformation. am, al: The structure is represented as the adjacency matrix or the adjacency
list. fs, vs: Data records are fixed size and variable size, respectively. dp: A system can use direct pointers to link
records. This enables storing and traversing adjacency data without maintaining indices. se: Edges can be stored in a
separate edge record. sv: Edges can be stored in a vertex record. lw: Edges can be lightweight (containing just a vertex
ID or a pointer, both stored in a vertex record). ms: A system can operate in a Multi Server (distributed) mode. rp: Given a
distributed mode, a system enables Replication of datasets. sh: Given a distributed mode, a system enables Sharding
of datasets. ce: Given a distributed mode, a system enables Concurrent Execution of multiple queries. pe: Given a
distributed mode, a system enables Parallel Execution of single queries on multiple nodes/CPUs. tr: Support for ACID
Transactions. oltp: Support for Online Transaction Processing. olap: Support for Online Analytical Processing. :
A system offers a given feature. : A system offers a given feature in a limited way. é: A system does not offer a given
feature. ?: Unknown.

18
Graph Database System Graph Database Query Language Other languages and additional remarks
SPARQL Gremlin Cypher SQL GraphQL Progr. API
NATIVE GRAPH DATABASES (RDF model based, triple stores) (§ 6.2).
AllegroGraph é é é é é é
Amazon Neptune é é é é é
AnzoGraph é é é é é
Apache Jena TDB é é é é (Java) é
Apache Marmotta é é é é é Apache Marmotta also supports its native LDP and LDPath languages.
BlazeGraph ∗ é é é é ∗ BlazeGraph offers SPARQL* to query RDF*.
BrightstarDB é é é é é é
Cray Graph Engine é é é é é é
gStore é é é é é é
Ontotext GraphDB é é é é é é
Profium Sense é é é é é é
TripleBit é é é é é é
NATIVE GRAPH DATABASES (LPG model based) (§ 6.9).
Gbase é é é é é é
GraphBase é é é é é é GraphBase uses its native query language.
Graphflow é é ∗‡ é é é ∗ Graphflow supports a subset of Cypher [159]. ‡ Graphflow supports
Cypher++ extension with subgraph-condition-action triggers [128].
LiveGraph é é é é é é No focus on languages and queries.
Memgraph é é ∗ é é é ∗ openCypher.
Neo4j é ∗ é ‡ † ∗ Gremlin is supported as a part of TinkerPop integration.
‡ GraphQL supported with the GRANDstack layer.
† Neo4j can be embedded in Java applications.
Sparksee/DEX é é é é (.NET) ∗ Sparksee/DEX also supports C++, Python, Objective-C, and Java APIs.

TigerGraph é é é é é é TigerGraph uses GSQL [218].
Weaver é é é é é (C)∗ ∗ Weaver also supports C++, Python.

TUPLE STORES (§ 6.3).


Graphd é é é é é é Graphd uses MQL [101].
WhiteDB é é é é é (C)∗ ∗ WhiteDB also supports Python.

DOCUMENT STORES (§ 6.5).


ArangoDB é é é é é ArangoDB uses AQL (ArangoDB Query Language).
Azure Cosmos DB é é é é é
Bitsy é é é é é Bitsy also supports other Tinkerpop-compatible languages such as
SQL2Gremlin and Pixy.
FaunaDB é é é é é é
OrientDB ∗ é (Java)‡ ∗ An SQL extension for graph queries. ‡ OrientDB offers bindings to C,
JavaScript, PHP, .NET, Python, and others.
KEY-VALUE STORES (§ 6.4).
Dgraph é é é é ∗ é ∗ A variant of GraphQL.
HyperGraphDB é é é é é (Java) é
MS Graph Engine é é é é é é MS Graph Engine uses LINQ [199].
RedisGraph é é é é é é
WIDE-COLUMN STORES (§ 6.6).
DSE Graph (DataStax) é é é é é DSE Graph also supports CQL [68].
HGraphDB é é é é é é
JanusGraph é é é é é é
Titan é é é é é é
RELATIONAL DBMS (RDBMS) (§ 6.7).

AgensGraph é é ∗ ‡ é é ∗ A variant called openCypher [104, 152]. ‡ ANSI-SQL.


FlockDB é é é é é FlockDB uses the Gizzard framework and MySQL.
IBM Db2 Graph é ∗ é é (Java)‡ ∗ IBM Db2 Graph supports only graph queries whose results can be
returned to rows. ‡ IBM Db2 Graph also supports Scala, Python and
Groovy.
MS SQL Server 2017 é é é ∗ é é ∗ Transact-SQL.
OQGRAPH é é é é é é
Oracle Spatial and Graph é é ∗ é é ∗ PGQL [221], an SQL-like graph query language.
SAP HANA é é é ∗ é ‡ ∗ SAP HANA offers bindings to Rust, ODBC, and others.
‡ GraphScript, a domain-specific graph query language.
SQLGraph é ∗ é ‡ é é ∗ SQLGraph doesn’t support Gremlin side effect pipes.
‡ Graph is encoded in a way specific to SQLGraph.

OBJECT-ORIENTED DATABASES (OODBMS) (§ 6.8).


Objectivity ThingSpan é é é é é é Objectivity ThingSpan uses a native DO query language [167].
VelocityGraph é é é é é (.NET) é
DATA HUBS (§ 6.10).
Cayley é ∗ é é é ∗ Cayley supports Gizmo, a Gremlin dialect [58].
Cayley also uses MQL [58].
InfoGrid é é é é é (REST) é
MarkLogic é é é é é é MarkLogic uses XQuery [46].
OpenLink Virtuoso é é é é OpenLink Virtuoso also supports XQuery [46], XPath v1.0 [63],
and XSLT v1.0 [129].
Stardog ∗ é é é ∗ Stardog supports the Path Query extension [208].

Table 4. Support for different graph database query languages in different graph database systems. “Progr. API”
determines whether a given system supports formulating queries using some native programming language such as C++.
“ ”: A system supports a given language. “ ”: A system supports a given language in a limited way. “é”: A system does
not support a given language.
19
model, as well as data and storage organization. Tables 2 and 3 illustrate the details of different
graph database systems, including the ones described in this section4 . The tables indicate which
features are supported by which systems. We use symbols “ ”, “ ”, and “é” to indicate that a
given system offers a given feature, offers a given feature in a limited way, and does not offer a
given feature, respectively. “?” indicates we were unable to infer this information based on the
available documentation. We report the support for different graph query languages in Table 4.
Finally, we analyze different taxonomy aspects in § 7.

6.1 Discussion on Selection Criteria


When selecting systems for consideration in the survey, we use two criteria. First, we use the
DB-Engines Ranking5 to select the most popular systems in each considered backend category.
We also pick interesting research systems (e.g, SQLGraph [210], LiveGraph [230], or Weaver [77])
which are not included in this ranking. For detailed discussions, we also consider the availability of
technical details (i.e., most systems are closed source or do not offer any design details).

6.2 RDF Stores (Triple Stores)


RDF stores [1, 52, 108, 165], also called triple stores, implement the Resource Description Framework
(RDF) model (§ 3.3.4). These systems organize data into triples. We now describe in more detail
a selected recent RDF store, Cray Graph Engine (§ 6.2.1). We also provide more details on two
other systems, AllegroGraph and BlazeGraph, focusing on variants of the RDF model used in these
systems (§ 6.2.2).

6.2.1 Cray Graph Engine. Cray Graph Engine (CGE) [187] is a triple store that can scale to a
trillion RDF triples. CGE does not store triples but quads (4-tuples), where the fourth element is a
graph ID. Thus, one can store multiple graphs in one CGE database. Quads in CGE are grouped
by their predicate and the identifier of the graph that they are a part of. Thus, only a pair with a
subject and an object needs to be stored for one such group of quads. These subject/object pairs are
stored in hashtables (one hashtable per group). Since each subject and object is represented as a
unique 48-bit integer identifier (HURI), the subject/object pairs can be packed into 12 bytes and
stored in a 32-bit unsigned integer array, ultimately reducing the amount of needed storage.

6.2.2 AllegroGraph and BlazeGraph. There exist many other RDF graph databases. We briefly
describe two systems that extend the original RDF model: AllegroGraph and BlazeGraph.
First, some RDF stores allow for attaching attributes to a triple explicitly. AllegroGraph [92]
allows an arbitrary set of attributes to be defined per triple when the triple is created. However,
these attributes are immutable. Figure 10 presents an example RDF graph with such attributes. This
figure uses the same LPG graph as in previous examples provided in Figure 5 and Figure 6, which
contain example transformations from the LPG into the original RDF model.
Second, BlazeGraph [44] implements RDF* [109, 110], an augmentation of RDF that allows for
attaching triples to triple predicates (see Figure 11). Vertices can use triples for storing labels and
properties, analogously as with the plain RDF. However, with RDF*, one can represent LPG edges
more naturally than in the plain RDF. Specifically, edges can be stored as triples, and edge properties
can be linked to the edge triple via other triples.

4We encourage participation in this survey. In case the reader is in possession of additional information relevant for the
tables, the authors would welcome the input.
5 https://db-engines.com/en/ranking/graph+dbms

20
:Person Alice Bob
name = Alice 21 triple attribute 24
age = 21
name name
age knows age
:knows {since:09.08.2007}
since = 09.08.2007 Person/V-ID Person/V-ID
type type
:Person
name = Bob vertex vertex
age = 24
LPG graph RDF graph, with triple attributes

Fig. 10. Comparison of an LPG graph and an RDF graph: a transformation from LPG to RDF with triple attributes.
We represent the triple attributes as a set of key-value pairs. “Person/V-ID”, “age”, “name”, “type” and “knows” are RDF
URIs. The transformation uses the assumption that there is one label per vertex and edge.

:Person
name = Alice Alice Bob
age = 21 21 09.08.2007 24
name name
age since age
:knows
since = 09.08.2007
Person/V-ID Person/V-ID
knows
type a triple attached type
:Person
to a triple
name = Bob
age = 24
vertex vertex
LPG graph RDF* graph

Fig. 11. Comparison of an LPG graph and an RDF* graph: a transformation from LPG to RDF*, that enables attaching
triples to triple predicates. “Person/V-ID”, “age”, “name”, “type”, “since” and “knows” are RDF URIs. The transformation
uses the assumption that there is one label per vertex and edge.

6.3 Tuple Stores


A tuple store is a generalization of an RDF store. RDF stores are restricted to triples (or quads, as in
CGE) whereas tuple stores can maintain tuples of arbitrary sizes, as detailed in § 3.4.3.

6.3.1 WhiteDB. WhiteDB [224] is a tuple store that enables allocating new records (tuples) with
an arbitrary tuple length (number of tuple elements). Small values and pointers to other tuples
are stored directly in a given field. Large strings are kept in a separate store. Each large value
is only stored once, and a reference counter keeps track of how many tuples refer to it at any
time. WhiteDB only enables accessing single tuple records, there is no higher level query engine
or graph API that would allow to, for example, execute a query that fetches all neighbors of a
given vertex. However, one can use tuples as vertex and edge storage, linking them to one another
via memory pointers. This facilitates fast resolution of various queries about the structure of an
arbitrary irregular graph structure in WhiteDB. For example, one can store a vertex 𝑣 with its
properties as consecutive fields in a tuple associated with 𝑣, and maintain pointers to selected
neighborhoods of 𝑣 in 𝑣’s tuple. More examples on using WhiteDB (and other tuple stores such as
Graphd) for maintaining graph data can be found online [157, 224].

6.4 Key-Value Stores


One can also explicitly use key-value (KV) stores for maintaining a graph (cf. § 3.4.1). We provide
details of using a collection of key-value pairs to model a graph in § 3.4.1. Here, we describe selected
KV stores used as graph databases: MS Graph Engine (also called Trinity) and HyperGraphDB.

21
6.4.1 Microsoft’s Graph Engine (Trinity). Microsoft’s Graph Engine [199] is based on a distributed
KV store called Trinity. Trinity implements a globally addressable distributed RAM storage. In
Trinity, keys are called cell IDs and values are called cells. A cell can hold data items of different data
types, including IDs of other cells. MS Graph Engine introduces a graph storage layer on top of the
Trinity KV storage layer. Vertices are stored in cells, where a dedicated field contains a vertex ID or
a hash of this ID. Edges adjacent to a given vertex 𝑣 are stored as a list of IDs of 𝑣’s neighboring
vertices, directly in 𝑣’s cell. However, if an edge holds rich data, such an edge (together with the
associated data) can also be stored in a separate dedicated cell.
6.4.2 HyperGraphDB. HyperGraphDB [120] stores hypergraphs (definition in § 3.3.1). The
basic building blocks of HyperGraphDB are atoms, the values of the KV store. Every atom has
a cryptographically strong ID. This reduces a chance of collisions (i.e., creating identical IDs for
different graph elements by different peers in a distributed environment). Both hypergraph vertices
and hyperedges are atoms. Thus, they have their own unique IDs. An atom of a hyperedge stores a
list of IDs corresponding to the vertices connected by this hyperedge. Vertices and hyperedges also
have a type ID (i.e., a label ID) and they can store additional data (such as properties) in a recursive
structure (referenced by a value ID). This recursive structure contains value IDs identifying other
atoms (with other recursive structures) or binary data. Figure 12 shows an example of how a KV
store is used to represent a hypergraph in HyperGraphDB.

key (atom ID) value (ID-list or binary data)

vertex ID type ID value ID

edge ID type ID value ID vertex ID ... vertex ID

value ID value ID ... value ID or binary data

Fig. 12. An example utilization of key-value stores for maintaining hypergraphs in HyperGraphDB (a type is a
term used in HyperGraphDB to refer to a label).

6.5 Document Stores


In document stores, a fundamental storage unit is a document, described in § 3.4.2. We select two
document stores for a more detailed discussion, OrientDB and ArangoDB.
6.5.1 OrientDB. In OrientDB [53], every document 𝑑 has a Record ID (RID), consisting of the ID
of the collection of documents where 𝑑 is stored, and the position (also referred to as the offset) within
this collection. Pointers (called links) between documents are represented using these unique RIDs.
OrientDB [53] introduces regular edges and lightweight edges. Regular edges are stored in an
edge document and can have their own associated key-value pairs (e.g., to encode edge properties
or labels). Lightweight edges, on the other hand, are stored directly in the document of the adjacent
(source or destination) vertex. Such edges do not have any associated key-value pairs. They consti-
tute simple pointers to other vertices, and they are implemented as document RIDs. Thus, a vertex
document not only stores the labels and properties of the vertex, but also a list of lightweight edges
(as a list of RIDs of the documents associated with neighboring vertices), and a list of pointers to
the adjacent regular edges (as a list of RIDs of the documents associated with these regular edges).
Each regular edge has pointers (RIDs) to the documents storing the source and the destination
vertex. Each vertex stores a list of links (RIDs) to its incoming and the outgoing edges.

22
Figure 13 contains an example of using documents for representing vertices, regular edges, and
lightweight edges in OrientDB. Figure 14 shows example vertex and edge documents.

a regular edge of type "knows"


edge 1
out since: 09.08.2007 in

vertex 1 vertex 2
name: Alice out in name: Bob
age: 21 age: 24
a lightweight edge

Fig. 13. Two vertex documents connected with a lightweight edge and a regular edge (knows) in OrientDB.

vertex document attribute (key-value) incoming edge RIDs outgoing edge RIDs lightweight edges: vertex RIDs

regular edge document attribute (key-value) incoming vertex RID outgoing vertex RID

Fig. 14. Example OrientDB vertex and edge documents (complex JSON documents are also supported).

6.5.2 ArangoDB. ArangoDB [18, 19] keeps its documents in a binary format called VelocyPack,
which is a compacted implementation of JSON documents. Documents can be stored in different
collections and have a _key attribute which is a unique ID within a given collection. Unlike
OrientDB, these IDs are no direct memory pointers. For maintaining graphs, ArangoDB uses vertex
collections and edge collections. The former are regular document collections with vertex documents.
Vertex documents store no information about adjacent edges. This has the advantage that a vertex
document does not have to be modified when one adds or removes edges. Second, edge collections
store edge documents. Edge documents have two particular properties: _from and _to, which are
the IDs of the documents associated with two vertices connected by a given edge. An optimization
in ArangoDB’s design prevents reading vertex documents and enables directly accessing one edge
document based on the vertex ID within another edge document. This may improve cache efficiency
and thus reduce query execution time [19].
One can use different collections of documents to store different edge types (e.g., “friend_of” or
“likes”). When retrieving edges conditioned on some edge type (e.g., “friend_of”), one does not have
to traverse the whole adjacency list (all “friend_of” and “likes” edges). Instead, one can target the
collection with the edges of the specific edge type (“friend_of”).

6.6 Wide-Column Stores


Wide-column stores combine different features of key-value stores and relational tables. On one
hand, a wide-column store maps keys to rows (a KV store that maps keys to values). Every row can
have an arbitrary number of cells and every cell constitutes a key-value pair. Thus, a row contains
a mapping of cell keys to cell values, effectively making a wide-column store a two-dimensional KV
store (a row key and a cell key both identify a specific value). On the other hand, a wide-column
store is a table, where cell keys constitute column names. However, unlike in a relational database,
the names and the format of columns may differ between rows within the same table. We illustrate
an example subset of rows and cells in a wide-column store in Figure 15.

23
key cell key | value cell cell

sorted key cell key | value cell cell


by key

key cell key | value cell cell

sorted by cell key

Fig. 15. An illustration of wide-column stores: mapping keys to rows and column-keys to cells within the rows.

6.6.1 Titan and JanusGraph. Titan [23] and its continuation JanusGraph [144] are built on top of
wide-column stores. They can use different wide-column stores as backends, for example Apache
Cassandra [14]. In both systems, when storing a graph, each row represents a vertex. Each vertex
property and adjacent edge is stored in a separate cell. One edge is thus encoded in a single cell,
including all the properties of this edge. Since cells in each row are sorted by the cell key, this
sorting order can be used to find cells efficiently. For graphs, cell keys for properties and edges are
chosen such that after sorting the cells, the cells storing properties come first, followed by the cells
containing edges, see Figure 16. Since rows are ordered by the key, both systems straightforwardly
partition tables into so called tablets, which can then be distributed over multiple servers.

vertex ID property property property property edge

sorted by vertex ID property property edge edge edge


vertex ID

vertex ID property property property edge edge

sorted by cell key

Fig. 16. An illustration of Titan and JanusGraph: using wide-column stores for storing graphs. The illustration is
inspired by and adapted from the work by Sharma [200].

6.7 Relational Database Management Systems


Relational Database Management Systems (RDBMS) store data in two dimensional tables with rows
and columns, described in more detail in the corresponding data model section in § 3.4.4.
There are two types of RDBMS: column RDBMS (not to be confused with wide-column stores)
and row RDBMS (also referred to as column-oriented or columnar and row-oriented). They differ
in physical data persistence. In many row RDBMS, data items in memory are kept in contiguous
rows [6, 190]. Column RDBMS, on the other hand, store table columns contiguously. Row RDBMS
are more efficient when only a few rows need to be retrieved, but with all their columns. Conversely,
column RDBMS are more efficient when many rows need to be retrieved, but only with a few
columns. Graph database solutions that use RDBMS as their backends use both row RDBMS (e.g.,
Oracle Spatial and Graph [171], OQGRAPH built on MariaDB [149]) and column RDBMS (e.g., SAP
HANA [195]).
6.7.1 Oracle Spatial and Graph. Oracle Spatial and Graph [171] is built on top of Oracle Database.
It provides a rich set of tools for administration and analysis of graph data. Oracle Spatial and
Graph comes with a range of built-in parallel graph algorithms (e.g., for community detection, path
finding, traversals, link prediction, PageRank, etc.). Both LPG and RDF models are supported. Rows

24
of RDBMS tables constitute vertices and relationships between these rows form edges. Associated
properties and attributes are stored as key-value pairs in separate structures.

6.8 Object-Oriented Databases


Object-oriented database management systems (OODBMS) [21] enable modeling, storing, and
managing data in the form of language objects used in object-oriented programming languages. We
summarize such objects in § 3.4.5.
6.8.1 VelocityGraph. VelocityGraph [223] is a graph database relying on the VelocityDB [222]
distributed object database. VelocityGraph edges, vertices, as well as edge or vertex properties are
stored in C# objects that contain references to other objects. To handle this structure, VelocityGraph
introduces abstractions such as VertexType, EdgeType, and PropertyType. Each object has a unique
object identifier (Oid), pointing to its location in physical storage. Each vertex and edge has one
type (label). Properties are stored in dictionaries. Vertices keep the adjacent edges in collections.

6.9 LPG-Based Native Graph Databases


Graph database systems described in the previous sections with the exception of triple stores are
all based on some database backend that was not originally built just for managing graphs. In
what follows, we describe LPG-based native graph databases: systems that were specifically build to
maintain and process graphs.
6.9.1 Neo4j: Direct Pointers. Neo4j [188] is the most popular graph database system, according
to different database rankings (see the links provided in the introduction). Neo4j implements the
LPG model using a storage design based on fixed-size records. A vertex 𝑣 is represented with a
vertex record, which stores (1) 𝑣’s labels, (2) a pointer to a linked list of 𝑣’s properties, (3) a pointer
to the first edge adjacent to 𝑣, and (4) some flags. An edge 𝑒 is represented with an edge record,
which stores (1) 𝑒’s edge type (a label), (2) a pointer to a linked list of 𝑒’s properties, (3) a pointer to
two vertex records that represent vertices adjacent to 𝑒, (4) pointers to the ALs of both adjacent
vertices, and (5) some flags. Each property record can store up to four properties, depending on the
size of the property value. Large values (e.g., long strings) are stored in a separate dynamic store.
Storing properties outside vertex and edge records allows those records to be small. Moreover, if
no properties are accessed in a query, they are not loaded at all. The AL of a vertex is implemented
as a doubly linked list. An edge is stored once, but is part of two such linked lists (one list for each
adjacent vertex). Thus, an edge has two pointers to the previous edges and two pointers to the next
edges. Figure 17 outlines the Neo4j design; Figure 18 shows the details of vertex and edge records.
A core concept in Neo4j is using direct pointers [188]: a vertex stores pointers to the physical
locations of its neighbors. Thus, for neighborhood queries or traversals, one needs no index and
can instead follow direct pointers (except for the root vertices in traversals). Consequently, the
query complexity does not dependent on the graph size. Instead, it only depends on how large the
visited subgraph is6 .
6.9.2 Sparksee/DEX: B+ Trees and Bitmaps. Sparksee is a graph database system that was
formerly known as DEX [151]. Sparksee implements the LPG model in the following way. Vertices
and edges (both are called objects) are identified by unique IDs. For each property name, there is
an associated B+ tree that maps vertex and edge IDs to the respective property values. The reverse
mapping from a property value to vertex and edge IDs is maintained by a bitmap, where a bit set to
6 That
said, if the graph does not fit into the main memory, the execution speed heavily depends on caching and cache
pre-warming, i.e., the running time may significantly increase.

25
previous edges in
the neighborhoods of
the adjacent vertices

vertex 1 knows vertex 2

name: Alice next edges in the name: Bob


the neighborhoods of
the adjacent vertices
age: 21 age: 24

vertex properties

Fig. 17. Summary of the Neo4j structure: two vertices linked by a “knows” edge. Both vertices maintain linked lists of
properties. The edges are part of two doubly linked lists, one linked list per adjacent vertex.

a link to the
first edge a linked list of property records,
record each holding four property blocks
a vertex record:
inUse flags
nextEdgeID nextPropID labels
1 5 9 14

an edge record: firstNextEdgeID secondNextEdgeID

inUse firstPrevEdgeID secondPrevEdgeID flags


firstVertex secondVertex relType nextPropID
1 5 9 13 17 21 25 29 33
links to adjacent vertices pointers in a doubly linked pointers in a doubly linked
adjacency list belonging adjacency list belonging to
to the first adjacent vertex the second adjacent vertex

Fig. 18. An overview of the Neo4j vertex and edge records.

one indicates that the corresponding ID has some property value. Labels and vertices and edges are
mapped to each other in a similar way. Moreover, for each vertex, two bitmaps are stored: One
bitmap indicates the incoming edges, and another one the outgoing edges. Furthermore, two B+
trees maintain the information about what vertices an edge is connected to (one tree for each edge
direction). Figure 19 illustrates example mappings.
Sparksee is one of the few systems that are not record based. Instead, Sparksee uses maps
implemented as B+ trees [65] and bitmaps. The use of bitmaps allows for some operations to be
performed as bit-level operations. For example, if one wants to find all vertices with certain values
of properties such as “age” and “first name”, one can simply find two bitmaps associated with the
“age” and the “first name” properties, and then derive a third bitmap that is a result of applying a
bitwise AND operation to the two input bitmaps.
Uncompressed bitmaps could grow unmanageably in size. As most graphs are sparse, bitmaps
indexed by vertices or edges mostly contain zeros. To alleviate large sizes of such sparse bitmaps,
they are cut into 32-bit clusters. If a cluster contains a non-zero bit, it is stored explicitly. The bitmap
is then represented by a collection of (cluster-id, bit-data) pairs. These pairs are stored in a sorted
tree structure to allow for efficient lookup, insertion, and deletion.

26
property/label edge or
vertex ID

edge or B+ tree ptr a value or bitmap ptr


0001001000001
vertex ID a label

vertex/edge connectivity (in/out directions) edge ID

B+ tree ptr bitmap ptr


edge ID vertex ID 00100110000010011000

Fig. 19. Sparksee maps for properties, labels, and vertex/edge connectivity. All mappings are bidirectional.

6.9.3 GBase: Sparse Adjacency Matrix Format. GBase [127] is a system that can only represent
the structure of a directed graph; it stores neither properties nor labels. The goal of GBase is to
maintain a compression of the adjacency matrix of a graph such that one can efficiently retrieval
all incoming and outgoing edges of a selected vertex without the prohibitive 𝑂 (𝑛 2 ) matrix storage
overheads. Simultaneously, using the adjacency matrix enables verifying in 𝑂 (1) time whether
two arbitrary vertices are connected. To compress the adjacency matrix, GBase cuts it into 𝐾 2
quadratic blocks (there are 𝐾 blocks along each row and column). Thus, queries that fetch in- and
out-neighbors of each vertex require only to fetch 𝐾 blocks. The parameter 𝐾 can be optimized
for specific databases. When 𝐾 becomes smaller, one has to retrieve more small files (assuming
one block is stored in one file). If 𝐾 grows larger, there are fewer files but they become larger,
generating overheads. Further optimizations can be made when blocks contain either only zeroes
or only ones; this enables higher compression rates.

6.10 Data Hubs


Data hubs are systems that enable using multiple data models and corresponding storage designs.
They often combine relational databases with RDF, document, and key-value stores. This can be
beneficial for applications that require a variety of data models, because it provides a variety of
storage options in a single unified database management system. One can keep using RDBMS
features, upon which many companies heavily rely, while also storing graph data.
6.10.1 OpenLink Virtuoso. OpenLink Virtuoso [170] provides RDBMS, RDF, and document
capabilities by connecting to a variety of storage systems. Graphs are stored in the RDF format
only, thus the whole discussion from § 3.3.4 also applies to Virtuoso RDF.
6.10.2 MarkLogic. MarkLogic [150] models graphs with documents for vertices, therefore al-
lowing an arbitrary number of properties for vertices. However, it uses RDF triples for edges.

7 TAKEAWAYS, INSIGHTS, FUTURE DIRECTIONS


We now offer different insights about the described systems, considering both practitioners and
researchers. We interleave these insights with suggestions for future developments and research.

7.1 Discussion, Takeaways, and Insights on Data Organization


We discuss the data organization aspects of our taxonomy with respect to specific graph databases.
For a detailed description and analysis of all the considered aspects, see § 5 and Tables 2 and 3.
7.1.1 Conceptual Graph Models. There is no one standard conceptual graph model, but two
models have proven to be popular: RDF and LPG. RDF is a well-defined standard. However, it only

27
supports simple triples (subject, predicate, object) representing edges from subject identifiers via
predicates to objects. LPG allows vertices and edges to have labels and properties, thus enabling
more natural data modeling in different scenarios [188]. Still, it is not standardized, and there are
many variants (cf. § 3.3.3); However, it is becoming standardized in the upcoming SQL/PGQ and
GQL standards from ISO [90]. Some systems limit the number of labels to just one. For example,
MarkLogic allows properties for vertices but none for edges, and thus can be viewed as a combination
of LPG (vertices) and RDF (edges). Data stored in the LPG model can be converted to RDF, as
described in § 3.3.5. To benefit from different LPG features while keeping RDF advantages such as
simplicity, some researchers proposed and implemented modifications to RDF. Examples are triple
attributes or attaching triples to other triples (described in § 6.2.2).
Among native graph databases, while no LPG focused system simultaneously supports RDF,
some RDF systems (e.g., Amazon Neptune) also support LPG. Further, there has been recent work
on unifying RDF and LPG [12, 95, 142]. Many other classes (KV stores, document stores, RDBMS,
wide-column stores, OODBMS) offer only LPG (with some exceptions, e.g., Oracle Spatial and
Graph). The latter suggests that it may be easier to express the LPG datasets than the RDF datasets
with the respective non-graph data models such as a collection of documents.
Another interesting challenge is to understand better the design tradeoffs between LPG and RDF.
For example, it is unclear which one is more advantageous for different workload classes, under
different design constraints (disk vs. in-memory representation, distributed vs. shared-memory,
replicated vs. sharded, etc.). This could be achieved by developing formal runtime and storage
models followed by an extensive evaluation.
There are very few systems that use neither RDF nor LPG. HyperGraphDB uses the hypergraph
model and GBase uses a simple directed graph model without any labels or properties. Thus, these
models are of less relevance to practitioners, but they offer a potentially interesting direction for
researchers. Especially in the context of hypergraphs, there has been a recent proliferation of
novel schemes in other domains of graph processing, such as graph learning [24], suggesting that
exploring hypergraphs for graph databases may be a timely direction.

7.1.2 Graph Structure Representations. Many graph databases use variants of AL since it makes
traversing neighborhoods efficient and straightforward [188]. This includes several (but not all)
systems in the classes of LPG based native graph databases, KV stores, document stores, wide-
column stores, tuple stores, and OODBMS. Contrarily, none of the considered RDF, RDBMS, and
data hub systems explicitly use AL. This is because the default design of the underlying data model,
e.g., tables in RDBMS or documents in document stores, do not often use AL.
Moreover, none of the systems that we analyzed use an uncompressed AM as it is inefficient
with O (𝑛 2 ) space, especially for sparse graphs. Systems using AM focus on compression of the
adjacency matrix [36], trying to mitigate storage and query overheads (e.g., GBase [127]).
In AL, a potential cause for inefficiency is scanning all edges to find neighbors of a given vertex.
To alleviate this, index structures are employed [40]. For a graph with 𝑛 vertices, such an index is
an array of pointers to respective neighborhoods, taking only 𝑂 (𝑛) space.

7.2 Discussion, Takeaways, and Insights on Data Optimizations


We discuss separately common optimizations in data organization.

7.2.1 Using Existing Storage Designs. Most graph database systems are built upon existing
storage designs, including key-value stores, wide-column stores, RDBMS, and others. The advantage
of using existing storage designs is that these systems are usually mature and well-tested. The
disadvantage is that they may not be perfectly optimized for graph data and graph queries. This

28
is what native graph databases attempt to address. Overall, there is a lot of research potential in
more efficient and more effective use of existing storage designs for graph databases. For example,
a promising direction would be to investigate how recent RDBMS development, such as worst-case
optimal joins [166], could be used for graph related workloads.

7.2.2 Data Layout of Records. The details of record-based data organization heavily depend on
a specific system. For example, an RDBMS could treat a table row as a record, key-value stores
often maintain a single value in a single record, while in document stores, a single document could
be a record. Importantly, some systems allow variable sized records (e.g., ArangoDB), others only
enable fixed sized records (e.g., Neo4j). Finally, we observe that while some systems (e.g., some
triple stores such as Cray Graph Engine) do not explicitly mention records, the data could still be
implicitly organized as records. In triple stores, one would naturally associate a triple with a record.
Graph databases often use one or more records per vertex (these records are sometimes referred
to as vertex records). Neo4j uses multiple fixed-size records for vertices, while document databases
use one document per vertex (e.g., ArangoDB). Edges are sometimes stored in the same record
together with the associated (source or destination) vertices (e.g., Titan or JanusGraph). Otherwise,
edges are stored in separate edge records (e.g., ArangoDB).
The records used by the studied graph databases may be unstructured (i.e., not having a pre-
specified format such as JSON), as is the case with KV stores. They can also be structured: document
databases often use the JSON format, wide-column stores have a key-value mapping inside each row,
row-oriented RDBMS divide each row into columns, OODBMS impose some class definition, and
tuple stores as well as some RDF stores use tuples. The details of data layout (i.e., how vertices and
edges are exactly represented and encoded in records) may still vary across different system classes.
Some structured systems still enable highly flexible structure inside their records. For example,
document databases that use JSON or wide-columns stores such as Titan and JanusGraph allow for
different key-value mappings for each vertex and edge. Other record based systems are more fixed
in their structure. For example, in OODBMS, one has to define a class for each configuration of
vertex and edge properties. In RDBMS, one has to define tables for each vertex or edge type.
Some systems (e.g., Sparksee, some triple stores, or column-oriented RDBMS) do not store
information about vertices and edges contiguously in dedicated records. Instead, they maintain
separate data structures for each property or label. The information about a given vertex is thus
distributed over different structures. To find a property of a particular vertex, one has to query the
associated data structure (index) for that property and find the value for the given vertex. Examples
of such used index structures are B+ trees (in Sparksee) or hashtables (in some RDF systems).
Overall, most systems use records to store vertices, most often one vertex per one record. Some
systems store edges in separate records, others store them together with the adjacent vertices. To
find a property of a particular vertex, one has to find a record containing the vertex. The searched
property is either stored directly in that record, or its location is accessible via a pointer. We observe
that these design choices are made arbitrarily. We conclude that an interesting research direction
would be developing formal performance models, and use them to guide the most advantageous
design choice for each type of storage backend for graph databases.

7.2.3 Adjacencies between Records. Another aspect of a graph data layout is the design of the
adjacency between records. One can either assign each record an ID and then link records to one
another via IDs, or one can use direct memory pointers. Using IDs requires an indexing structure
to find the physical storage address of a record associated with a particular ID. Direct memory
pointers do not require an index for a traversal from one record to its adjacent records. Note that

29
an index might still be used, for example to retrieve a vertex with a particular property value (in
this context, direct pointers only facilitate resolving adjacency queries between vertices).
Using direct pointers can accelerate graph traversals [188], as additional index traversals are
avoided. Another option is to assign each record a unique ID and use these IDs instead of direct
pointers to refer to other records. On one hand, this requires an additional indexing structure to
find the physical location of a record based on its ID. On the other hand, if the physical location
changes, it is usually easier to update the indexing structure instead of changing all associated
direct pointers, which may come with significant overhead [19]. Here, one could also make these
design choices and tradeoffs more accurate by designing performance models to guide them.
7.2.4 Storing Data Directly in Indexes. Sometimes graph data is stored directly in an index. Triple
stores use indexes for various permutations of subject, predicate and object to answer queries
efficiently. Jena TBD stores its triple data inside of these indexes, but has no triple table itself, since
the indexes already store all necessary data [17]. HyperGraphDB uses a key-value index, namely
Berkeley DB [168], to access its physical storage. This approach also enables sharing primitive data
values with a reference count, so that multiple identical values are stored only once [120].
7.2.5 Storing Strings. Many systems, for example RDFs, heavily use strings, for example for
keeping IDs of different parts of graph datasets. Thus, there are different schemes or maintaining
strings. First, many systems support both fixed-size and variable-size strings. This enables more
performance: the former (that encode, e.g., IDs) can be kept and accessed rapidly in dedicated
structures that assume certain specific sizes, while the latter (that encode, e.g., arbitrary text items)
often use separate dynamic structures that are slower to access but offer more flexibility [188, 224].
Such dynamic stores can be both in-memory structures, but they could even be separate files [188].
The considered systems offer other string related optimizations. For example, CGE optimizes how
it stores strings from its triples/quads. Storing multiple long strings per triple/quad is inefficient,
considering the fact that many triples/quads may share strings. Therefore, CGE – similarly to many
other RDF systems – maintains a dictionary that maps strings to unique 48-bit integer identifiers
(HURIs). For this, two distributed hashtables are used (one for mapping strings to HURIs and one
for mapping HURIs to strings). When loading, the strings are sorted and then assigned to HURIs.
This allows integer comparisons (equal, greater, smaller, etc.) to be used instead of more expensive
string comparisons. This approach is shared by, e.g., tuple stores such as WhiteDB.
RDF systems also harness external dedicated libraries for more effective string management.
For example, the RDF String library [179] facilitates constructing RDF strings. Specifically, it
automatically generates string representations of specified values, together with appropriate URI
prefixes. This library can be used to even encode multi-line strings, or conduct string interpolation
with embedded expressions. Another example is RDF String Turtle [214], a package that enables
conversions between the string-based and RDF JSON representations of RDF.
7.2.6 Data Distribution. Almost all considered systems support a multi server mode and data
replication. Data sharding is also widely supported, but there are some systems that do not offer
this feature. We expect that, with growing dataset sizes, data sharding will ultimately become as
common as data replication. Still, it is more complex to provide. We observe that, while sharding is
as widely supported on graph databases based on non-graph data models (e.g., document stores)
as data replication, there is a significant fraction of native graph databases (both RDF and LPG
based) that offer replication but not sharding. This indicates that non-graph backends are usually
more mature in designs and thus more attractive for industry purposes and for practitioners,
simultaneously implying research potential in the native graph database designs. We also observe
that certain systems offer some form of tradeoff between replication and sharding. Specifically,

30
OrientDB offers a form of sharding, in which not all collections of documents have to be copied
on each server. However, OrientDB does not enable sharding of the collections themselves (i.e.,
distributing one collection across many servers). If an individual collection grows large, it is the
responsibility of the user to partition the collection to avoid any additional overheads. Thus, it
would be interesting to research how to automatize sharding of single document collections, or
even single (large) documents. Another such example is Neo4j which supports replication and
provides certain level of support for sharding. Specifically, the user can partition the graph and
store each partition in a separate database, limiting data redundancy.
Here, another interesting research opportunity would be to accelerate graph database workloads
such as OLAP by harnessing partial data replication. Specifically, many OLAP graph database
workloads such as BFS can be expressed with linear algebra operations [130], and these workloads
could benefit from the partial replication of the adjacency matrix, for example as done in 2.5D &
-3D matrix multiplications [205, 206].
7.2.7 Indexes. Most graph database systems use indexes. Now, systems based on non-graph
backends, for example RDBMS or document stores, usually rely on existing indexing infrastructure
present in such systems. Native graph databases employ index structures for the neighborhoods of
each vertex, often in the form of direct pointers [188].
Neighborhood indexes are used mostly to speed up the access of adjacency lists to accelerate
traversal queries. JanusGraph calls these indexes vertex-centric. They are constructed specifically
for vertices, so that incident edges can be filtered efficiently to match the traversal conditions [23].
While JanusGraph allows multiple vertex-centric indexes per vertex, each optimized for different
conditions, which are then chosen by the query optimizer, simpler solutions exist as well. LiveGraph
uses a two-level hierarchy, where the first level distinguishes edges by their label, before pointing
to the actual physical storage [230]. Graphflow indexes the neighbors of a vertex into forward and
backward adjacency lists, where each list is first partitioned by the edge label, and secondly by the
label of the neighbor vertex [128]. Another example is Sparksee, which uses various different index
structures to find the adjacent vertices and properties of a vertex [151].
An interesting research opportunity is to design indexes for richer “higher-order” structural infor-
mation, beyond plain neighborhoods. Specifically, a recent wave of pattern matching schemes [41,
158] indicates the importance of higher-order graph structure, such as triangles that each vertex
belongs to. Indexing such information would significantly speed up queries related to clique mining,
dense subgraph discovery, clustering, and many others. While some work has been done in this
respect [196], many designs could be proposed.
Data indexes concern data beyond the neighborhood information, and they can be used to
accelerate query plan generation and query execution. It is possible for example to index all vertices
that have a specific property (value). They are usually employed to speed up Business Intelligence
workloads (details on workloads are in § 4). Many triple stores, for example AllegroGraph [92],
provide all six permutations of subject (S), predicate (P), and object (O) as well as additional
aggregated indexes. However, to reduce associated costs, other approaches exist as well: TripleBit
uses just two permutations (PSO, POS) with two aggregated indexes (SP, SO) and two auxiliary
index structures [228]. gStore implements pattern matching queries with the help of two index
structures: a VS*-tree, which is a specialized B+-tree, and a trie-based T-index [231]. Some database
systems like Amazon Neptune [5] or AnzoGraph [55] only provide implicit indexes, while still
being confident to answer all kinds of queries efficiently. However, most graph database systems
allow the user to explicitly define data indexes. Some of them, like Azure Cosmos DB [161], support
composite indexes (a combination of different labels/properties) for more specific use cases. In
addition to internal indexes, some systems employ external indexing tools. For example, Titan and

31
Graph Database System Tree Hashtable Skip list Additional remarks
Apache Jena TBD ∗ é é ∗ B+-tree
ArangoDB é ∗ ∗ ∗ depends on the used index engine
Blazegraph ∗ é é ∗ B+-tree
Dgraph é é
Memgraph é é
OrientDB ∗ ‡ é ∗ SB-tree with base 500
‡ also supports a distributed hashtable index
VelocityGraph ∗ é é ∗ B-tree
Virtuoso ∗ é é ∗ 2D R-tree
WhiteDB ∗ é é ∗ T-tree

Table 5. Support for different index implementations in different graph database systems. “ ”: A system supports
a given index implementation. “é”: A system does not support a given index implementation.

JanusGraph [23] use internal indexing for label- and value-based lookups, but rely on external
indexing backends (e.g., Elasticsearch [82] or Apache Solr [13]) for non-trivial lookups involving
multiple properties, ranges, or full-text search.
Structural indexes are used for various internal data. Here, LiveGraph uses a vertex index to
map its vertex IDs to a physical storage location [230]. ArangoDB uses a hybrid index, a hashtable,
to find the documents of incident edges and adjacent vertices of a vertex [18].
We categorize systems (for which we were able to find this information) according to this
criteria in Table 5. We find no clear connection between the index type and the backend of a graph
database, but most systems use tree based indexes. A research opportunity, useful especially for
practitioners, would be to conduct a detailed and broad performance comparison of different index
implementations for different workloads.

7.2.8 Data Organization vs. Database Performance. Record based systems usually deliver more
performance for queries that need to retrieve all or most information about a vertex or an edge.
They are more efficient because the required data is stored in consecutive memory blocks. In
systems that store data in indexes, one queries a data structure per property, which results in a
more random access pattern. On the other hand, if one only wants to retrieve single properties
about vertices or edges, such systems may only have to retrieve a single value. Contrarily, many
record based systems cannot retrieve only parts of records, fetching more data than necessary.
Furthermore, a decision on whether to use IDs versus direct memory pointers to link records
depends on the read/write ratio of the workload for the given system. In the former case, one
has to use an indexing structure to find the address of the record. This slows down read queries
compared to following direct pointers. However, write queries can be more efficient with the use of
IDs instead of pointers. For example, when a record has to be moved to a new address, all pointers
to this record need to be updated to reflect this new address. IDs could remain the same, only the
indexing structure needs to modify the address of the given record.
The available performance studies [9, 133, 153, 154, 220] indicate that systems based on non-graph
data models, for example document stores or wide-column stores, usually achieve more performance
for transactional workloads that update the graph. Contrarily, read-only workloads (both simple
and global analytics) often achieve more performance on native graph stores. Global analytics
particularly benefit from native graph stores that ensure parallelization of single queries [153].
It underlies the potential and research opportunities for developing hybrid database systems for
graph workloads, that could combine the advantages of databases using non-graph data models
and of native graph stores.

32
7.3 Discussion and Takeaways on Query Execution
We discuss the query execution aspects of our taxonomy with respect to the specific graph databases.
Our discussion is by necessity brief, as most systems do not disclose this information7 .

7.3.1 Concurrency & Parallelization. We start with concurrency and parallelization.


Support for Concurrency and Parallelism in OLTP Queries We conclude that (1) almost all
systems support concurrent OLTP queries, and (2) in almost all classes of systems, fewer systems
support parallel OLTP query execution (with the exception of OODBMS based graph databases).
This indicates that more databases put more stress on high throughput of queries executed per
time unit rather than on lowering the latency of a single query. A notable exception is the Cray
Graph Engine, which does not support concurrent queries, but it does offer parallelization of single
queries. In general, we expect most systems to ultimately support both features. With the growing
importance of more complex OLTP workloads and the ongoing process of blurring the difference
between complex OLTP and Business Intelligence workloads, putting more focus on parallel OLAP
designs is becoming a more attractive and urgent research direction [213].
Support for Concurrency and Parallelism in OLAP Queries OLAP queries are usually
parallelized. This is because such queries often involve all the vertices and edges, making the
sequential execution prohibitively long. Moreover, parallelization of such queries is facilitated by
the fact that there exists a plethora of related work, due to the prevalence of certain OLAP queries
(e.g., traversals, centralities) in static graph analytics [29, 38, 126, 130, 155]. At the same time,
concurrent execution of different OLAP queries is supported only to some degree. For example,
Weaver supports concurrent large-scale OLAP queries such as BFS, but each such query processes
an immutable separate snapshot of the graph dataset. One could attempt to investigate how to
effectively run concurrent OLAP queries (as well as large-scale read-only Business Intelligence
workloads that have similar characteristics), which potentially only needs certain amount of basic
state bookkeeping.
Support for Concurrent OLAP & OLTP Queries Concurrent execution of OLAP and OLTP
queries is not widely supported. Systems, that support multi-versioning such as LiveGraph or
Weaver, run an OLAP query on a consistent graph snapshot, and current OLTP queries modify
different vertex and/or edge versions or introduce newer versions. Other systems such as Neo4j,
while allowing concurrent OLTP and OLAP queries, leave dealing with inconsistencies of OLAP
queries to the client as the default isolation level (read-committed) does not protect such queries
from modification by other queries or even offers repeatable reads. This aspect is potentially rich in
research and new design opportunities, as it requires fundamental understanding of different effects
that could occur in native graph stores when executing concurrent OLAP and OLTP, analogously
to the effects happening at different isolation levels in RDBMS systems.
Implementing Concurrent Execution One of the methods for query concurrency are different
types of locks. For example, WhiteDB provides database wide locking with a reader-writer lock [184,
224] which enables concurrent readers but only one writer at a time. As an alternative to locking
the whole database, one can also update fields of tuples atomically (set, compare and set, add).
WhiteDB itself does not enforce consistency, it is up to the user to use locks and atomics correctly.
Another method is based on transactions, used for example by OrientDB that provides distributed
transactions with ACID semantics. We discuss transactions separately in § 7.3.2. Here, an interesting
research opportunity would be to harness lock-free synchronization protocols known from parallel
computing [45] when implementing different fine-gained OLTP queries.

7 There is usually much more information available on the data layout of a graph database, and not its execution engine.

33
Optimizing Parallel Execution Some of the systems that support parallel query execution
explicitly optimize the amount of data communicated when executing such parallelized queries. For
example, the computation in CGE is distributed over the participating processes. To minimize the
amount of all-to-all communication, query results are aggregated locally and – whenever possible –
each process only communicates with a few peers to avoid network congestion. Another way to
minimize communication, used by MS Graph Engine and the underlying Trinity database, is to
reduce the sizes of data chunks exchanged by processes. For this, Trinity maintains special accessors
that allow for accessing single attributes within a cell without needing to load the complete cell.
This lowers the I/O cost for many operations that do not need the whole cells. Several systems
harness one-sided communication, enabling processes to access one another’s data directly [98]. For
example, Trinity can be deployed on InfiniBand [118] to leverage Remote Direct Memory Access
(RDMA) [98]. Similarly, Cray’s infrastructure makes memory resources of multiple compute nodes
available as a single global address space, also enabling one-sided communication in CGE. This
facilitates parallel programming in a distributed environment [34, 98, 198].
Here, a promising research opportunity is to harness established optimized communication
routines such as collectives [59] for large-scale OLAP and BI.
Other Execution Optimizations The considered databases come with numerous other system-
specific design optimizations. For example, an optimization in ArangoDB’s design allows to skip
accessing the vertex document and enables directly accessing one edge document based on the
vertex ID within another edge document. This may improve cache efficiency and thus reduce query
execution time [19]. Another example is Oracle Spatial and Graph that offers an interesting option
of switching its data backend based on the query being executed. Specifically, its in-memory analysis is
boosted by the possibility to switch the underlying relational storage with the native graph storage
provided by the PGX processing engine [81, 117, 191]. In such a configuration, Oracle Spatial and
Graph effectively becomes a native graph database. PGX comes with two variants, PGX.D and
PGX.SM, that – respectively – offer distributed and shared-memory processing capabilities [117].
Furthermore, some systems use the LPG specifics to implement OLAP queries more effectively. For
example, to implement BFS, Weaver uses special designated properties associated with vertices to
indicate, whether a vertex has already been already visited. Such special properties are not visible
to an external user of the graph database and are usually deallocated after a given query is finalized.
A lot of work has been done into optimizing distributed-memory algebraic operations such as
matrix products using optimal communication routines [96, 100, 137–139, 205, 206]. It would be
interesting to investigate how such routines can be used to speed up OLAP graph queries.
There is a large body of existing work in the design of dynamic graph processing frameworks [33].
These systems differ from graph databases in several aspects, for example they often employ simple
graph models (and not LPG or RDF). Simultaneously, they share the fundamental property of graph
databases: dealing with a dynamic graph with evolving structure. Moreover, different performance
analyses indicate that streaming frameworks are much faster (up to orders of magnitude) than
graph databases, especially in the context of raw graph updates per second [154, 220]. This suggest
that harnessing mechanisms used in such frameworks in the context of graph databases could
significantly enhance the performance of the latter, and is a potentially fruitful research direction.
Furthermore, while there exists past research into the impact of the underlying network on the
performance of a distributed graph analytics framework [172], little was done into investigating this
performance relationship in the context of graph database workloads. To the best of our knowledge,
there are no efforts into developing a topology-aware or routing-aware data distribution scheme for
graph databases, especially in the context of recently proposed data center and high-performance
computing network topologies [35, 132] and routing architectures [32].

34
Finally, contrarily to the general static graph processing and graph streaming, little research
exists into accelerating graph databases using different types of hardware architectures, accelerators,
and hardware-related designs, for example FPGAs [39], designs related to network interface cards
such as SmartNICs [73], or processing in memory [4].

7.3.2 ACID. We also discuss various aspects of ACID transactions. ACID transactions are usually
used to implement OLTP queries. OLAP queries are read-only analytics, and thus are less relevant
for the notion of ACID. For example, in Weaver, OLAP workloads are referred to as “node programs”
and are treated differently from transactions, which are used to implement OLTP queries.
Support Overall, support for ACID transactions is widespread in graph databases. However,
there are some differences between respective system classes. For example, all considered document
and RDBMS graph databases offer full ACID support. Contrarily, only around half of all considered
key-value and wide-column based systems support ACID transactions. This could be caused by the
fact that some backends have more mature transaction related designs.
Implementation Two important ways to implement transactions are through locking or time-
stamps. Neo4j uses write locks to protect modifications until they are committed. Weaver uses
timestamps to reorder transactions into an serializable order if necessary. Other systems such as
LiveGraph combine both ways, and use timestamps to select the correct version of a vertex/edge
(solving simple read/write conflicts) and vertex locks to deal with concurrent writes.

7.3.3 Support for OLAP and OLTP Queries. We analyze support for OLTP and OLAP. Both
categories are widely supported, but with certain differences across specific backend classes,
specifically, (1) all considered document stores focus solely on OLTP, (2) some RDBMS graph
databases do not support or focus on OLAP, and (3) some native graph databases do not support
OLTP. We conjecture that this is caused by the prevalent historic use cases of these systems, and the
associated features of the backend design. For example, document stores have traditionally mostly
focused on maintaining document related data and to answer simple queries, instead of running
complicated global graph analytics. Thus, it may be very challenging to ensure high performance
of such global workloads on this backend class. Instead, native graph databases work directly with
the graph data model, making it simpler to develop fast traversals and other OLAP workloads.
As for RDBMS, they were traditionally not associated with graph global workloads. However,
graph analytics based on RDBMS has become a separate and growing area of research. Zhao et
al. [229] study the general use of RDBMS for graphs. They define four new relational algebra
operations for modeling graph operations. They show how to define these four operations with
six smaller building blocks: basic relational algebra operations, such as group-by and aggregation.
Xirogiannopoulos et al. [225] describe GraphGen, an end-to-end graph analysis framework that
is built on top of an RDBMS. GraphGen supports graph queries through so called Graph-Views
that define graphs as transformations over underlying relational datasets. This provides a graph
modeling abstraction, and the underlying representation can be optimized independently.
Some document stores still provide at least partial support for traversal-like workloads. For
example, in ArangoDB, documents are indexed using a hashtable, where the _key attribute serves
as the hashtable key. A traversal over the neighbors of a given vertex works as follows. First, given
the _key of a vertex 𝑣, ArangoDB finds all 𝑣’s adjacent edges using the hybrid index. Next, the
system retrieves the corresponding edge documents and fetches all the associated _to properties.
Finally, the _to properties serve as the new _key properties when searching for the neighboring
vertices.
There are other correlations between supported workloads and system design features. For
instance, we observe that systems that do not target OLTP, also often do not provide, or focus on,

35
ACID transactions. This is because ACID is not commonly used with OLAP. Examples include Cray
Graph Engine, RedisGraph, or Graphflow.
There also exist many OLAP graph workloads that have been largely unaddressed by the design
and performance analyses of existing graph database systems. This includes vertex reordering
problems (e.g., listing vertices by their degeneracy), or optimization (e.g., graph coloring) [31].
There problems were considered in the context of graph algorithms processing simple graphs, and
incorporating rich models such as RDF would further increase complexity, and offer many associated
research challenges, for example designing indexes, data layouts, or distribution strategies.
7.3.4 Supported Languages. We also analyze support for graph query languages. Some types of
backends focus on one specific language: triple stores and SPARQL, document stores and Gremlin,
wide-column stores and Gremlin, RDBMS and SQL. Other classes are not distinctively correlated
with some specific language, although Cypher seems most popular among LPG based native graph
stores. Usually, the query language support is primarily affected by the supported conceptual graph
model; if it is RDF, then the system usually supports SPARQL while systems focusing on LPG often
support Cypher or Gremlin.
Several systems come with their own languages, or variants of the established ones. For example,
in MS Graph Engine, cells are associated with a schema that is defined using the Trinity Specification
Language (TSL) [199]. TSL enables defining the structure of cells similarly to C-structs. For example,
a cell can hold data items of different data types, including IDs of other cells. Moreover, querying
graphs in Oracle Spatial and Graph is possible using PGQL [221], a declarative, SQL-like, graph
pattern matching query language. PGQL is designed to match the hybrid structure of Oracle Spatial
and Graph, and it allows for querying both data stored on disk in Oracle Database as well as in
in-memory parts of graph datasets.
Besides their primary language, systems also offer support for additional language functionalities.
For example, Oracle Spatial and Graph also supports SQL and SPARQL (for RDF graphs). Moreover,
the offered Java API implements Apache Tinkerpop interfaces, including the Gremlin API.

7.4 Insights for Practitioners


An important question on whether RDBMS or non-relational native graph backends are more
suitable for graph workloads, is far from being fully answered. On one hand, several analyses [3,
87, 123, 174, 217] – including very recent ones [216] – indicate better performance of RDBMS
over native graph database designs. However, these analyses focus on systems designed with a
single class of workloads in mind (e.g., OLAP analytics), and on homogeneous graphs without
rich additional label and property data. Thus, they are not conclusive for more realistic scenarions
where a mix of OLAP, OLTP, and BI workloads runs over rich LPG datasets. This is supported by
another recent study, which states that “The workloads (...) require several storage and processing
features that existing RDBMSs are generally not optimized for.” [89].
In general, whenever the data schema is known in advance, RDBMS – with its long-standing
history of optimizations for such cases – would be a preferable choice. Contrarily, when the data
schema is not known, graph databases would most probably offer more performance.
Overall, systems based on non-graph data models, such as RDBMS or – to a certain degree –
others (e.g., document stores) offer most mature designs and well-understood behavior related to
different isolation levels. As such, these systems are the best option when one needs a system that
offers predictable behavior in the first place, and reasonable performance for standard workloads.
On the other hand, when one aims at highest performance of purely graph workloads, it is worth
considering native graph stores. This is especially the case for the most recent graph workload
classes that are only now being introduced in the GDB landscape, such as subgraph queries [158].

36
8 CONCLUSION
Graph databases constitute an important area of academic research and different industry efforts.
They are used to maintain, query, and analyze numerous datasets in different domains in industry
and academia. Many graph databases of different types have been developed. They use many data
models and representations, they are constructed using miscellaneous design choices, and they
enable a large number of queries and workloads. In this work, we provide the first survey and
taxonomy of this rich graph database landscape. Our work can be used not only by researchers
willing to learn more about this fascinating subject, but also by architects, developers, and project
managers who want to select the most advantageous graph database system or design.

Acknowledgments
We thank Gabor Szarnyas for extensive feedback, and Hannes Voigt, Matteo Lissandrini, Daniel
Ritter, Lukas Braun, Janez Ales, Nikolay Yakovets, and Khuzaima Daudjee for insightful remarks.
We thank Timo Schneider for immense help with infrastructure at SPCL, and PRODYNA AG (Darko
Križić, Jens Nixdorf, Christoph Körner) for generous support. This project received funding from the
European Research Council (Project PSAP, No. 101002047), and the European High-Performance
Computing Joint Undertaking (JU) under grant agreement No. 955513 (MAELSTROM). This project
was supported by the ETH Future Computing Laboratory (EFCL), financed by a donation from
Huawei Technologies. This project received funding from the European Union’s HE research and
innovation programme under the grant agreement No. 101070141 (Project GLACIATION).

References
[1] Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. 2007. Scalable Semantic Web Data Management
Using Vertical Partitioning. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB ’07).
411–422.
[2] Sunitha Abburu and Suresh Babu Golla. 2015. Effective Partitioning and Multiple RDF Indexing for Database Triple
Store. Engineering Journal 19, 5 (Oct. 2015), 139–154.
[3] Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. Empty-
Headed: A Relational Engine for Graph Processing. ACM Trans. Database Syst. 42, 4, Article 20 (Oct 2017), 44 pages.
[4] Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A Scalable Processing-in-Memory
Accelerator for Parallel Graph Processing. In Proceedings of the 42nd Annual International Symposium on Computer
Architecture (ISCA ’15). ACM, 105–117.
[5] Amazon. 2018. Amazon Neptune. Available at https://aws.amazon.com/neptune/. (2018).
[6] Amazon. 2022. Columnar Storage Developer Guide. https://docs.aws.amazon.com/redshift/latest/dg/c_columnar_
storage_disk_mem_mgmnt.html. (2022).
[7] Renzo Angles, Marcelo Arenas, Pablo Barcelo, Peter Boncz, George Fletcher, Claudio Gutierrez, Tobias Lindaaker,
Marcus Paradies, Stefan Plantikow, Juan Sequeda, Oskar van Rest, and Hannes Voigt. 2018. G-CORE: A Core for Future
Graph Query Languages. In Proceedings of the International Conference on Management of Data (SIGMOD ’18). ACM,
1421–1432.
[8] Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan Reutter, and Domagoj Vrgoč. 2017. Foundations of
Modern Query Languages for Graph Databases. ACM Comput. Surv. 50, 5, Article 68 (Sept. 2017), 40 pages.
[9] Renzo Angles, Peter Boncz, Josep Larriba-Pey, Irini Fundulaki, Thomas Neumann, Orri Erling, Peter Neubauer, Norbert
Martinez-Bazan, Venelin Kotsev, and Ioan Toma. 2014. The Linked Data Benchmark Council: A Graph and RDF
Industry Benchmarking Effort. SIGMOD Rec. 43, 1 (May 2014), 27–31.
[10] Renzo Angles and Claudio Gutierrez. 2008. Survey of Graph Database Models. ACM Comput. Surv. 40, 1, Article 1 (Feb.
2008), 39 pages.
[11] Renzo Angles and Claudio Gutierrez. 2018. An Introduction to Graph Data Management. In Graph Data Management,
Fundamental Issues and Recent Developments, George H. L. Fletcher, Jan Hidders, and Josep Lluís Larriba-Pey (Eds.).
Springer, 1–32.
[12] Renzo Angles, Aidan Hogan, Ora Lassila, Carlos Rojas, Daniel Schwabe, Pedro Szekely, and Domagoj Vrgoč. 2022.
Multilayer Graphs: A Unified Data Model for Graph Databases. In Proceedings of the 5th ACM SIGMOD Joint International

37
Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) (GRADES-
NDA ’22). Article 11, 6 pages.
[13] Apache Software Foundation. 2004. Apache Solr. Available at https://solr.apache.org/. (2004).
[14] Apache Software Foundation. 2008. Apache Cassandra. Available at https://cassandra.apache.org/. (2008).
[15] Apache Software Foundation. 2013. Apache Giraph. Available at https://giraph.apache.org/. (2013).
[16] Apache Software Foundation. 2018. Apache Mormotta. Available at http://marmotta.apache.org/. (2018).
[17] Apache Software Foundation. 2021. Apache Jena TBD. Available at https://jena.apache.org/documentation/tdb/index.
html. (2021).
[18] ArangoDB Inc. 2018. ArangoDB. Available at https://www.arangodb.com/docs/stable/data-models.html. (2018).
[19] ArangoDB Inc. 2018. ArangoDB: Index Free Adjacency or Hybrid Indexes for Graph Databases. Available at https:
//www.arangodb.com/2016/04/index-free-adjacency-hybrid-indexes-graph-databases/. (2018).
[20] Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: A Database
Benchmark Based on the Facebook Social Graph. In Proceedings of the International Conference on Management of Data
(SIGMOD ’13). ACM, 1185–1196.
[21] Malcolm Atkinson, David DeWitt, David Maier, François Bancilhon, Klaus Dittrich, and Stanley Zdonik. 1989. The
Object-Oriented Database System Manifesto. In Proceedings of the First International Conference on Deductive and
Object-Oriented Databases (DOOD ’89). 223–240.
[22] Paolo Atzeni and Valeria De Antonellis. 1993. Relational Database Theory. Addison-Wesley.
[23] Aurelius. 2015. Titan Data Model. Available at http://s3.thinkaurelius.com/docs/titan/1.0.0/data-model.html. (2015).
[24] Song Bai, Feihu Zhang, and Philip H. S. Torr. 2021. Hypergraph convolution and hypergraph attention. Pattern
Recognition 110 (Feb. 2021), 107637.
[25] Alexandru T. Balaban. 1985. Applications of Graph Theory in Chemistry. J. Chem. Inf. Comput. Sci. 25, 3 (Aug. 1985),
334–343.
[26] Sumita Barahmand and Shahram Ghandeharizadeh. 2013. BG: A Benchmark to Evaluate Interactive Social Networking
Actions. In Proceedings of the 6th Biennial Conference on Innovative Data Systems Research (CIDR ’13).
[27] Jesús Barrasa. 2017. RDF Triple Stores vs. Labeled Property Graphs: What’s the Difference? Available at https:
//neo4j.com/blog/rdf-triple-store-vs-labeled-property-graph-difference/. (2017).
[28] Daniel Bartholomew. 2012. MariaDB vs. MySQL. Dostopano 7, 10 (2012).
[29] Omar Batarfi, Radwa El Shawi, Ayman G Fayoumi, Reza Nouri, Seyed-Mehdi-Reza Beheshti, Ahmed Barnawi, and
Sherif Sakr. 2015. Large Scale Graph Processing Systems: Survey and an Experimental Evaluation. Cluster Computing
18, 3 (2015), 1189–1213.
[30] Scott Beamer, Krste Asanović, and David Patterson. 2015. The GAP Benchmark Suite. (2015). arXiv:1508.03619
[31] Maciej Besta, Armon Carigiet, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, and Torsten Hoefler. 2020.
High-Performance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and Quality. In Proceedings of the
International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20). IEEE, Article 99,
17 pages.
[32] Maciej Besta, Jens Domke, Marcel Schneider, Marek Konieczny, Salvatore Di Girolamo, Timo Schneider, Ankit Singla,
and Torsten Hoefler. 2021. High-Performance Routing with Multipathing and Path Diversity in Ethernet and HPC
Networks. IEEE Transactions on Parallel and Distributed Systems 32, 4 (2021), 943–959.
[33] Maciej Besta, Marc Fischer, Vasiliki Kalavri, Michael Kapralov, and Torsten Hoefler. 2023. Practice of Streaming
Processing of Dynamic Graphs: Concepts, Models, and Systems. IEEE Transactions on Parallel and Distributed Systems
34, 6 (2023), 1860–1876.
[34] Maciej Besta and Torsten Hoefler. 2014. Fault Tolerance for Remote Memory Access Programming Models. In
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’14).
ACM, 37–48.
[35] Maciej Besta and Torsten Hoefler. 2014. Slim Fly: A Cost Effective Low-Diameter Network Topology. In Proceedings of
the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’14). IEEE, 348–359.
[36] Maciej Besta and Torsten Hoefler. 2018. Survey and Taxonomy of Lossless Graph Compression and Space-Efficient
Graph Representations. (2018). arXiv:1806.01799
[37] Maciej Besta, Florian Marending, Edgar Solomonik, and Torsten Hoefler. 2017. SlimSell: A Vectorizable Graph
Representation for Breadth-First Search. In Proceedings of the International Parallel and Distributed Processing Symposium
(IPDPS ’17). IEEE, 32–41.
[38] Maciej Besta, Michał Podstawski, Linus Groner, Edgar Solomonik, and Torsten Hoefler. 2017. To Push or To Pull:
On Reducing Communication and Synchronization in Graph Computations. In Proceedings of the 26th International
Symposium on High-Performance Parallel and Distributed Computing (HPDC ’17). ACM, 93–104.

38
[39] Maciej Besta, Dimitri Stanojevic, Johannes De Fine Licht, Tal Ben-Nun, and Torsten Hoefler. 2019. Graph Processing
on FPGAs: Taxonomy, Survey, Challenges. (2019). arXiv:1903.06697
[40] Maciej Besta, Dimitri Stanojevic, Tijana Zivic, Jagpreet Singh, Maurice Hoerold, and Torsten Hoefler. 2018. Log(Graph):
A Near-Optimal High-Performance Graph Representation. In Proceedings of the 27th International Conference on Parallel
Architectures and Compilation Techniques (PACT ’18). ACM, Article 7, 13 pages.
[41] Maciej Besta, Zur Vonarburg-Shmaria, Yannick Schaffner, Leonardo Schwarz, Grzegorz Kwasniewski, Lukas Gianinazzi,
Jakub Beranek, Kacper Janda, Tobias Holenstein, Sebastian Leisinger, Peter Tatkowski, Esref Ozdemir, Adrian Balla,
Marcin Copik, Philipp Lindenberger, Marek Konieczny, Onur Mutlu, and Torsten Hoefler. 2021. GraphMineSuite:
Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra. Proc. VLDB Endow. 14, 11
(Jul 2021), 1922–1935.
[42] Maciej Besta, Simon Weber, Lukas Gianinazzi, Robert Gerstenberger, Andrey Ivanov, Yishai Oltchik, and Torsten
Hoefler. 2019. Slim Graph: Practical Lossy Graph Compression for Approximate Graph Processing, Storage, and
Analytics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and
Analysis (SC ’19). ACM, Article 35, 25 pages.
[43] Bitnine Global Inc. 2016. AgensGraph. Available at https://bitnine.net/agensgraph/. (2016).
[44] Blazegraph. 2018. BlazeGraph DB. Available at https://www.blazegraph.com/. (2018).
[45] Guy E. Blelloch and Bruce M. Maggs. 2010. Parallel Algorithms. In Algorithms and Theory of Computation Handbook:
Special Topics and Techniques. Chapman & Hall/CRC.
[46] Scott Boag, Don Chamberlin, Mary F Fernández, Daniela Florescu, Jonathan Robie, Jérôme Siméon, and Mugur
Stefanescu. 2002. XQuery 1.0: An XML Query Language.
[47] Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A Multiresolution
Coordinate-Free Ordering for Compressing Social Networks. In Proceedings of the 20th International Conference on
World Wide Web (WWW ’11). ACM, 587–596.
[48] Angela Bonifati, George Fletcher, Hannes Voigt, and Nikolay Yakovets. 2018. Querying Graphs. Springer.
[49] Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. The Journal of Mathematical Sociology 25, 2 (2001),
163–177.
[50] Tim Bray. 2014. The JavaScript Object Notation (JSON) Data Interchange Format. RFC 7159. (2014).
[51] Tim Bray, Jean Paoli, C Michael Sperberg-McQueen, Eve Maler, and Franois Yergeau. 2000. Extensible Markup Language
(XML) 1.0.
[52] Jeen Broekstra, Arjohn Kampman, and Frank van Harmelen. 2002. Sesame: A Generic Architecture for Storing and
Querying RDF and RDF Schema. In Proceedings of the International Semantic Web Conference (ISWC ’02), Ian Horrocks
and James Hendler (Eds.). Lecture Notes in Computer Science, Vol. 2342. Springer, 54–68.
[53] Callidus Software Inc. 2010. OrientDB. Available at https://orientdb.org. (2010).
[54] Callidus Software Inc. 2018. OrientDB: Lightweight Edges. Available at https://orientdb.com/docs/3.0.x/java/
Lightweight-Edges.html. (2018).
[55] Cambridge Semantics. 2018. AnzoGraph. Available at https://www.cambridgesemantics.com/anzograph/. (2018).
[56] Mihai Capotă, Tim Hegeman, Alexandru Iosup, Arnau Prat-Pérez, Orri Erling, and Peter Boncz. 2015. Graphalytics: A
Big Data Benchmark for Graph-Processing Platforms. In Proceedings of the GRADES’15. ACM, Article 7, 6 pages.
[57] Arnaud Castelltort and Anne Laurent. 2013. Representing history in graph-oriented NoSQL databases: A versioning
system. In Proceedings of the Eighth International Conference on Digital Information Management (ICDIM ’13). IEEE,
228–234.
[58] Cayley. 2018. CayleyGraph. Available at https://cayley.io/ and https://github.com/cayleygraph/cayley. (2018).
[59] Ernie Chan, Marcel Heimlich, Avi Purkayastha, and Robert van de Geijn. 2007. Collective Communication: Theory,
Practice, and Experience: Research Articles. Concurr. Comput.: Pract. Exper. 19, 13 (Sept. 2007), 1749–1783.
[60] Jiefeng Cheng, Jeffrey Xu Yu, Bolin Ding, Philip S. Yu, and Haixun Wang. 2008. Fast Graph Pattern Matching. In
Proceedings of the 24th International Conference on Data Engineering (ICDE ’08). IEEE, 913–922.
[61] Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One Trillion
Edges: Graph Processing at Facebook-Scale. Proc. VLDB Endow. 8, 12 (Aug 2015), 1804–1815.
[62] Marek Ciglan, Alex Averbuch, and Ladialav Hluchy. 2012. Benchmarking Traversal Operations over Graph Databases.
In Proceedings of the 28th International Conference on Data Engineering Workshops (ICDEW ’12). IEEE, 186–189.
[63] James Clark and Steve DeRose. 1999. XML Path Language (XPath) Version 1.0.
[64] Edgar F. Codd. 1989. Relational Database: A Practical Foundation for Productivity. In Readings in Artificial Intelligence
and Databases. Elsevier, 60–68.
[65] Douglas Comer. 1979. The Ubiquitous B-Tree. ACM Comput. Surv. 11, 2 (June 1979), 121–137.
[66] Richard Cyganiak, David Wood, and Markus Lanthaler. 2014. RDF 1.1 Concepts and Abstract Syntax.

39
[67] George Bernard Dantzig and D. R. Fulkerson. 1955. On the Max Flow Min Cut Theorem of Networks. In RAND
Corporation Paper Series.
[68] DataStax Inc. 2018. DSE Graph (DataStax). Available at https://www.datastax.com/. (2018).
[69] Chris J. Date and Hugh Darwen. 1987. A Guide to the SQL Standard. Addison-Wesley.
[70] Ali Davoudian, Liu Chen, and Mengchi Liu. 2018. A Survey on NoSQL Stores. ACM Comput. Surv. 51, 2, Article 40
(April 2018), 43 pages.
[71] Dgraph Labs Inc. 2017. BadgerDB. Available at https://dbdb.io/db/badgerdb. (2017).
[72] Dgraph Labs, Inc. 2018. DGraph. Available at https://dgraph.io/ and https://dgraph.io/docs/. (2018).
[73] Salvatore Di Girolamo, Konstantin Taranov, Andreas Kurth, Michael Schaffner, Timo Schneider, Jakub Beránek, Maciej
Besta, Luca Benini, Duncan Roweth, and Torsten Hoefler. 2019. Network-Accelerated Non-Contiguous Memory
Transfers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and
Analysis (SC ’19). ACM, Article 56, 14 pages.
[74] E. W. Dijkstra. 1959. A Note on Two Problems in Connexion with Graphs. Numer. Math. 1, 1 (Dec. 1959), 269–271.
[75] Niels Doekemeijer and Ana Lucia Varbanescu. 2014. A Survey of Parallel Graph Processing Frameworks. Technical
Report. Delft University of Technology.
[76] David Dominguez-Sal, Peter Urbón-Bayes, Aleix Giménez-Vanó, Sergio Gómez-Villamor, Norbert Martínez-Bazan, and
Josep-Lluis Larriba-Pey. 2010. Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark.
In Proccedings of the International Conference on Web-Age Information Management (WAIM ’10), Heng Tao Shen, Jian
Pei, M. Tamer Özsu, Lei Zou, Jiaheng Lu, Tok-Wang Ling, Ge Yu, Yi Zhuang, and Jie Shao (Eds.). Lecture Notes in
Computer Science, Vol. 6185. Springer, 37–48.
[77] Ayush Dubey, Greg D. Hill, Robert Escriva, and Emin Gün Sirer. 2016. Weaver: A High-Performance, Transactional
Graph Database Based on Refinable Timestamps. Proc. VLDB Endow. 9, 11 (July 2016), 852–863.
[78] Paul DuBois. 1999. MySQL. New Riders Publishing.
[79] William Eberle, Jeffrey Graves, and Lawrence Holder. 2010. Insider Threat Detection Using a Graph-Based Approach.
Journal of Applied Security Research 6, 1 (2010), 32–81.
[80] David Ediger, Rob McColl, Jason Riedy, and David A Bader. 2012. STINGER: High performance data structure for
streaming graphs. In Conference on High Performance Extreme Computing (HPEC ’12). IEEE, 1–5.
[81] Hamid El Maazouz, Guido Wachsmuth, Martin Sevenich, Dalila Chiadmi, Sungpack Hong, and Hassan Chafi. 2019. A
DSL-Based Framework for Performance Assessment. In Proceedings of the International Conference Europe Middle East
& North Africa Information Systems and Technologies to Support Learning (EMENA-ISTL ’19) (Learning and Analytics in
Intelligent System), Mohammed Serrhini, Carla Silva, and Sultan Aljahdali (Eds.), Vol. 7. Springer, 260–270.
[82] Elastic. 2010. Elasticsearch. Available at https://www.elastic.co/elasticsearch/. (2010).
[83] Ramez Elmasri and Shamkant B. Navathe. 2011. Advantages of Distributed Databases. In Fundamentals of Database
Systems (6th ed.). Pearson, Chapter 25.1.5, 882.
[84] Ramez Elmasri and Shamkant B. Navathe. 2011. Data Fragmentation. In Fundamentals of Database Systems (6th ed.).
Pearson, Chapter 25.4.1, 894–897.
[85] Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and
Peter Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. In Proceedings of the International
Conference on Management of Data (SIGMOD ’15). ACM, 619–630.
[86] FactNexus. 2018. GraphBase. Available at https://graphbase.ai/. (2018).
[87] Jing Fan, Adalbert Gerald Soosai Raj, and Jignesh M. Patel. 2015. The Case Against Specialized Graph Analytics
Engines. In Proceedings of the 7th Biennial Conference on Innovative Data Systems Research (CIDR ’15).
[88] Fauna. 2016. FaunaDB. Available at https://fauna.com/. (2016).
[89] Xiyang Feng, Guodong Jin, Ziyi Chen, Chang Liu, and Semih Salihoğlu. 2023. KÙZU Graph Database Management
System. In Proceedings of the Conference on Innovative Data Systems Research (CIDR ’23).
[90] Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak,
Liat Peterfreund, Alexandra Rogova, and Domagoj Vrgoč. 2023. A Researcher’s Digest of GQL. In Proceedings of
the 26th International Conference on Database Theory (ICDT ’23), Floris Geerts and Brecht Vandevoort (Eds.). Leibniz
International Proceedings in Informatics (LIPIcs), Vol. 255. 1:1–1:22.
[91] Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow,
Mats Rydberg, Petra Selmer, and Andrés Taylor. 2018. Cypher: An Evolving Query Language for Property Graphs. In
Proceedings of the International Conference on Management of Data (SIGMOD ’18). ACM, 1433–1445.
[92] Franz Inc. 2018. AllegroGraph. Available at https://allegrograph.com/products/allegrograph/. (2018).
[93] Santhosh Kumar Gajendran. 2012. A Survey on NoSQL Databases. (2012).
[94] Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2002. Data Replication. In Database Systems: The
Complete Book (1st ed.). Pearson, Chapter 19.4.3, 1021.

40
[95] Ewout Gelling, George Fletcher, and Michael Schmidt. 2023. Bridging graph data models: RDF, RDF-star, and property
graphs as directed acyclic graphs. (2023). arXiv:2304.13097
[96] Evangelos Georganas, Jorge González-Domínguez, Edgar Solomonik, Yili Zheng, Juan Touriño, and Katherine Yelick.
2012. Communication Avoiding and Overlapping for Numerical Linear Algebra. In Proceedings of the International
Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12). IEEE, Article 100, 11 pages.
[97] Lars George. 2011. HBase: The Definitive Guide. O’Reilly Media.
[98] Robert Gerstenberger, Maciej Besta, and Torsten Hoefler. 2014. Enabling Highly-Scalable Remote Memory Access
Programming with MPI-3 One Sided. Scientific Programming 22, 2 (2014), 75–91.
[99] Lukas Gianinazzi, Pavel Kalvoda, Alessandro De Palma, Maciej Besta, and Torsten Hoefler. 2018. Communication-
Avoiding Parallel Minimum Cuts and Connected Components. SIGPLAN Not. 53, 1 (Feb 2018), 219–232.
[100] Niels Gleinig, Maciej Besta, and Torsten Hoefler. 2022. I/O-Optimal Cache-Oblivious Sparse Matrix-Sparse Matrix
Multiplication. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS ’22). IEEE,
36–46.
[101] Google. 2018. Graphd. Available at https://github.com/google/graphd. (2018).
[102] Graph Story Inc. 2018. Graph Story. Available at https://github.com/graphstory. (2018).
[103] Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Martin Schuster,
Petra Selmer, and Hannes Voigt. 2019. Updating Graph Databases with Cypher. Proc. VLDB Endow. 12, 12 (Aug. 2019),
2242–2254.
[104] Alastair Green, Martin Junghanns, Max Kießling, Tobias Lindaaker, Stefan Plantikow, and Petra Selmer. 2018. open-
Cypher: New Directions in Property Graph Querying. In Proceedings of the Joint EDBT/ICDT Workshops (EDBT ’18).
520–523.
[105] L. Guzenda. 2000. Objectivity/DB - A high performance object database architecture. In Proceedings of the Workshop
on High Performance Object Databases (HIPOD ’00).
[106] Jing Han, E Haihong, Guan Le, and Jian Du. 2011. Survey on NoSQL database. In Proceedings of the 6th International
Conference on Pervasive Computing and Applications (ICPCA ’11). IEEE, 363–366.
[107] Minyang Han, Khuzaima Daudjee, Khaled Ammar, M. Tamer Özsu, Xingfang Wang, and Tianqi Jin. 2014. An
Experimental Comparison of Pregel-like Graph Processing Systems. Proc. VLDB Endow. 7, 12 (Aug. 2014), 1047–1058.
[108] Andreas Harth, Jürgen Umbrich, Aidan Hogan, and Stefan Decker. 2007. YARS2: A Federated Repository for Querying
Graph Structured Data from the Web. In Proceedings of the International Semantic Web Conference/Asian Semantic Web
Conference (ISWC/ASWC ’07), Karl Aberer, Key-Sun Choi, Natasha Noy, Dean Allemang, Kyung-Il Lee, Lyndon Nixon,
Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudré-Mauroux (Eds.).
Lecture Notes in Computer Science, Vol. 4825. Springer, 211–224.
[109] Olaf Hartig. 2014. Reconciliation of RDF* and Property Graphs. (2014). arXiv:1409.3288
[110] Olaf Hartig. 2017. RDF* and SPARQL*: An Alternative Approach to Annotate Statements in RDF. In Proceedings of
the International Semantic Web Conference (Posters, Demos & Industry Track) (ISWC ’17).
[111] Olaf Hartig. 2019. Foundations to Query Labeled Property Graphs using SPARQL*. In Joint Proceedings of the 1st
International Workshop On Semantics For Transport and the 1st International Workshop on Approaches for Making Data
Interoperable (SEM4TRA-AMAR ’19).
[112] Olaf Hartig and Jorge Pérez. 2018. Semantics and Complexity of GraphQL. In Proceedings of the World Wide Web
Conference (WWW ’18). 1155–1164.
[113] Jonathan Hayes. 2004. A Graph Model for RDF. Diploma Thesis. Technische Universität Darmstadt, Universidad de
Chile.
[114] Joseph M. Hellerstein and Michael Stonebraker. 2005. Readings in Database Systems. MIT Press.
[115] Jeffrey A. Hoffer, Venkataraman Ramesh, and Heikki Topi. 2011. Modern Database Management. Pearson.
[116] Florian Holzschuher and René Peinl. 2013. Performance of Graph Query Languages: Comparison of Cypher, Gremlin
and Native Access in Neo4j. In Proceedings of the Joint EDBT/ICDT Workshops (EDBT ’13). ACM, 195–204.
[117] Sungpack Hong, Siegfried Depner, Thomas Manhardt, Jan Van Der Lugt, Merijn Verstraaten, and Hassan Chafi. 2015.
PGX.D: A Fast Distributed Graph Processing Engine. In Proceedings of the International Conference for High Performance
Computing, Networking, Storage and Analysis (SC ’15). ACM, Article 58, 12 pages.
[118] InfiniBand Trade Association. 2015. InfiniBand: Architecture Specification 1.3. (2015).
[119] InfoGrid. 2008. The InfoGrid Graph Database. Available at http://infogrid.org. (2008).
[120] Borislav Iordanov. 2010. HyperGraphDB: A Generalized Graph Database. In Proceedings of the International Conference
on Web-Age Information Management (WAIM ’10), Heng Tao Shen, Jian Pei, M. Tamer Özsu, Lei Zou, Jiaheng Lu,
Tok-Wang Ling, Ge Yu, Yi Zhuang, and Jie Shao (Eds.). Lecture Notes in Computer Science, Vol. 6185. Springer, 25–36.
[121] Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat-Pérez, Thomas Manhardto, Hassan
Chafio, Mihai Capotă, Narayanan Sundaram, Michael Anderson, Ilie Gabriel Tănase, Yinglong Xia, Lifeng Nai, and

41
Peter Boncz. 2016. LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed
Platforms. Proc. VLDB Endow. 9, 13 (Sept. 2016), 1317–1328.
[122] Bin Jiang. 2011. A Short Note on Data-Intensive Geospatial Computing. In Information Fusion and Geographic
Information Systems, Vasily V. Popovich, Christophe Claramunt, Thomas Devogele, Manfred Schrenk, and Kyrill
Korolenko (Eds.). Lecture Notes in Geoinformation and Cartography, Vol. 5. Springer, 13–17.
[123] Alekh Jindal, Samuel Madden, Malú Castellanos, and Meichun Hsu. 2015. Graph Analytics Using Vertica Relational
Database. In Proceedings of the International Conference on Big Data (Big Data ’15). IEEE, 1191–1200.
[124] Salim Jouili and Valentin Vansteenberghe. 2013. An Empirical Comparison of Graph Databases. In Proceedings of the
International Conference on Social Computing (SocialCom ’13). IEEE, 708–715.
[125] Martin Junghanns, André Petermann, Martin Neumann, and Erhard Rahm. 2017. Management and Analysis of Big
Graph Data: Current Systems and Open Challenges. In Handbook of Big Data Technologies, Albert Y. Zomaya and
Sherif Sakr (Eds.). Springer, 457–505.
[126] Vasiliki Kalavri, Vladimir Vlassov, and Seif Haridi. 2017. High-Level Programming Abstractions for Distributed Graph
Processing. IEEE Transactions on Knowledge and Data Engineering 30, 2 (2017), 305–324.
[127] U. Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos. 2012. Gbase: An Efficient Analysis
Platform for Large Graphs. The VLDB Journal 21, 5 (Oct. 2012), 637–650.
[128] Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu. 2017. Graphflow: An
Active Graph Database. In Proceedings of the International Conference on Management of Data (SIGMOD ’17). ACM,
1695–1698.
[129] Michael Kay. 2001. XSLT Programmer’s Reference. Wrox Press.
[130] Jeremy Kepner, Peter Aaltonen, David Bader, Aydın Buluç, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj
Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, Carl Yang, John D. Owens, Marcin Zalewski,
Timothy Mattson, and Jose Moreira. 2016. Mathematical Foundations of the GraphBLAS. In High Performance Extreme
Computing Conference (HPEC ’16). IEEE, 1–9.
[131] Vaibhav Khadilkar, Murat Kantarcioglu, Bhavani Thuraisingham, and Paolo Castagna. 2012. Jena-HBase: A Dis-
tributed, Scalable and Efficient RDF Triple Store. In Proceedings of the International Semantic Web Conference (Posters &
Demonstrations Track) (ISWC ’12). 85–88.
[132] John Kim, Wiliam J. Dally, Steve Scott, and Dennis Abts. 2008. Technology-Driven, Highly-Scalable Dragonfly
Topology. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA ’08). IEEE, 77–88.
[133] Vojtech Kolomicenko. 2013. Analysis and Experimental Comparison of Graph Databases. Master Thesis. Charles
University, Prague.
[134] Joseph B. Kruskal. 1956. On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. Proc.
Amer. Math. Soc. 7, 1 (1956), 48–50.
[135] Vijay Kumar and Anjan Babu. 2015. Domain Suitable Graph Database Selection: A Preliminary Report. In Proceedings
on the 3rd International Conference on Advances in Engineering Sciences & Applied Mathematics (ICAESAM ’15). 26–29.
[136] Rohit kumar Kaliyar. 2015. Graph databases: A survey. In Proceedings of the International Conference on Computing,
Communication & Automation (ICCCA ’15). IEEE, 785–790.
[137] Grzegorz Kwasniewski, Tal Ben-Nun, Lukas Gianinazzi, Alexandru Calotoiu, Timo Schneider, Alexandros Nikolaos
Ziogas, Maciej Besta, and Torsten Hoefler. 2021. Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O
Lower Bounds for Statically Analyzable Programs. In Proceedings of the 33rd Symposium on Parallelism in Algorithms
and Architectures (SPAA ’21). ACM, 328–339.
[138] Grzegorz Kwasniewski, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Timo Schneider, Maciej Besta, and Torsten Hoefler.
2021. On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal LU Factorization. In Proceedings of the
26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’21). 463–464.
[139] Grzegorz Kwasniewski, Marko Kabić, Maciej Besta, Joost VandeVondele, Raffaele Solcà, and Torsten Hoefler. 2019.
Red-Blue Pebbling Revisited: Near Optimal Parallel Matrix-Matrix Multiplication. In Proceedings of the International
Conference for High Performance Computing, Networking, Storage and Analysis (SC ’19). ACM, Article 24, 22 pages.
[140] Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper.
Syst. Rev. 44, 2 (April 2010), 35–40.
[141] LambdaZen LLC. 2017. Bitsy. Available at https://github.com/lambdazen/bitsy and https://bitbucket.org/lambdazen/
bitsy/wiki/Home. (2017).
[142] Ora Lassila, Michael Schmidt, Olaf Hartig, Brad Bebee, Dave Bechberger, Willem Broekema, Ankesh Khandelwal,
Kelvin Lawrence, Carlos Manuel Lopez Enriquez, Ronak Sharda, and Bryan Thompson. 2023. The OneGraph Vision:
Challenges of Breaking the Graph Model Lock-In. Semantic Web 14, 1 (2023), 125–134.
[143] Heng Lin, Xiaowei Zhu, Bowen Yu, Xiongchao Tang, Wei Xue, Wenguang Chen, Lufei Zhang, Torsten Hoefler,
Xiaosong Ma, Xin Liu, Weimin Zheng, and Jingfang Xu. 2018. ShenTu: Processing Multi-Trillion Edge Graphs on

42
Millions of Cores in Seconds. In Proceedings of the International Conference for High Performance Computing, Networking,
Storage and Analysis (SC ’18). IEEE, Article 56, 11 pages.
[144] Linux Foundation. 2018. JanusGraph. Available at http://janusgraph.org/. (2018).
[145] Matteo Lissandrini, Martin Brugnara, and Yannis Velegrakis. 2017. An Evaluation Methodology and Experimental
Comparison of Graph Databases. Technical Report. University of Trento.
[146] Matteo Lissandrini, Martin Brugnara, and Yannis Velegrakis. 2018. Beyond Macrobenchmarks: Microbenchmark-Based
Graph Database Evaluation. Proc. VLDB Endow. 12, 4 (Dec. 2018), 390–403.
[147] Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan W. Berry. 2007. Challenges in Parallel Graph
Processing. Parallel Processing Letters 17, 1 (2007), 5–20.
[148] Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz
Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In Proceedings of the International Conference on
Management of Data (SIGMOD ’10). ACM, 135–146.
[149] MariaDB. 2010. OQGRAPH. Available at https://mariadb.com/kb/en/oqgraph-storage-engine/. (2010).
[150] MarkLogic Corporation. 2005. MarkLogic. Available at https://www.marklogic.com. (2005).
[151] Norbert Martínez-Bazan, M. Ángel Águila Lorente, Victor Muntés-Mulero, David Dominguez-Sal, Sergio Gómez-
Villamor, and Josep-L. Larriba-Pey. 2012. Efficient Graph Management Based On Bitmap Indices. In Proceedings of the
16th International Database Engineering & Applications Symposium (IDEAS ’12). ACM, 110–119.
[152] József Marton, Gábor Szárnyas, and Dániel Varró. 2017. Formalising openCypher Graph Queries in Relational Algebra.
In Proceedings of the European Conference on Advances in Databases and Information Systems (ADBIS ’17), Mārı̄te
Kirikova, Kjetil Nørvåg, and George A. Papadopoulos (Eds.). Lecture Notes in Computer Science, Vol. 10509. Springer,
182–196.
[153] Kristyn J. Maschhoff, Robert Vesse, Sreenivas R Sukumar, Michael F Ringenburg, and James Maltby. 2017. Quantifying
Performance of CGE: A Unified Scalable Pattern Mining and Search System. In Proceedings of the Cray User Group
(CUG ’17).
[154] Robert Campbell McColl, David Ediger, Jason Poovey, Dan Campbell, and David A. Bader. 2014. A Performance
Evaluation of Open Source Graph Databases. In Proceedings of the First Workshop on Parallel Programming for Analytics
Applications (PPAA ’14). ACM, 11–18.
[155] Robert Ryan McCune, Tim Weninger, and Greg Madey. 2015. Thinking Like a Vertex: A Survey of Vertex-Centric
Frameworks for Large-Scale Distributed Graph Processing. ACM Comput. Surv. 48, 2, Article 25 (Oct. 2015), 39 pages.
[156] Memgraph Ltd. 2018. Memgraph. Available at https://memgraph.com/. (2018).
[157] Scott M. Meyer, Jutta Degener, John Giannandrea, and Barak Michener. 2010. Optimizing Schema-Last Tuple-
Store Queries in Graphd. In Proceedings of the International Conference on Management of Data (SIGMOD ’10). ACM,
1047–1056.
[158] Amine Mhedhbi, Matteo Lissandrini, Laurens Kuiper, Jack Waudby, and Gábor Szárnyas. 2021. LSQB: A Large-Scale
Subgraph Query Benchmark. In Proceedings of the 4th ACM SIGMOD Joint International Workshop on Graph Data
Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) (GRADES-NDA ’21). Article 8,
11 pages.
[159] Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing Subgraph Queries by Combining Binary and Worst-Case
Optimal Joins. Proc. VLDB Endow. 12, 11 (July 2019), 1692–1704.
[160] Microsoft. 2017. Microsoft SQL Server 2017. Available at https://www.microsoft.com/en-us/sql-server/sql-server-2017.
(2017).
[161] Microsoft. 2018. Azure Cosmos DB. Available at https://azure.microsoft.com/en-us/services/cosmos-db/. (2018).
[162] Bruce Momjian. 2001. PostgreSQL: Introduction and Concepts. Addison-Wesley.
[163] Thomas Mueller. 2005. H2 Database Engine. Available at http://www.h2database.com. (2005).
[164] Networked Planet Limited. 2018. BrightstarDB. Available at http://brightstardb.com/. (2018).
[165] Thomas Neumann and Gerhard Weikum. 2010. X-RDF-3X: Fast Querying, High Update Rates, and Consistency for
RDF Databases. Proc. VLDB Endow. 3, 1–2 (Sept. 2010), 256–263.
[166] Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2018. Worst-Case Optimal Join Algorithms. J. ACM 65, 3,
Article 16 (March 2018), 40 pages.
[167] Objectivity Inc. 2018. ThingSpan. Available at https://www.objectivity.com/products/thingspan/. (2018).
[168] Mike Olson, Keith Bostic, and Margo Seltzer. 1999. Berkeley DB. In Proceedings of the USENIX Annual Technical
Conference (ATC ’99).
[169] Ontotext. 2018. GraphDB. Available at https://www.ontotext.com/products/graphdb/. (2018).
[170] OpenLink. 2018. Virtuoso. Available at https://virtuoso.openlinksw.com/. (2018).
[171] Oracle. 2018. Oracle Spatial and Graph. Available at https://www.oracle.com/database/technologies/spatialandgraph.
html. (2018).

43
[172] Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun. 2015. Making Sense of
Performance in Data Analytics Frameworks. In Proceedings of the 12th USENIX Symposium on Networked Systems
Design and Implementation (NSDI ’15). 293–307.
[173] M. Tamer Özsu. 2016. A survey of RDF data management systems. Frontiers of Computer Science 10, 3 (2016), 418–432.
[174] Anil Pacaci, Alice Zhou, Jimmy Lin, and M. Tamer Özsu. 2017. Do We Need Specialized Graph Databases? Bench-
marking Real-Time Social Networking Applications. In Proceedings of the Fifth International Workshop on Graph
Data-Management Experiences & Systems (GRADES ’17). ACM, Article 12, 7 pages.
[175] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing
Order to the Web. Technical Report. Stanford InfoLab.
[176] N.S. Patil, P Kiran, N.P. Kavya, and K.M. Naresh Patel. 2018. A Survey on Graph Database Management Techniques
for Huge Unstructured Data. International Journal of Electrical and Computer Engineering 81, 2 (2018), 1140–1149.
[177] Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2009. Semantics and Complexity of SPARQL. ACM Trans.
Database Syst. 34, 3, Article 16 (Sept. 2009), 45 pages.
[178] Hasso Plattner. 2009. A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database. In
Proceedings of the International Conference on Management of Data (SIGMOD ’09). ACM, 1–2.
[179] Tomasz Pluskiewicz. 2022. RDF String. https://github.com/tpluscode/rdf-string. (2022).
[180] Jaroslav Pokornỳ. 2015. Graph Databases: Their Power and Limitations. In Proceedings of the IFIP International
Conference on Computer Information Systems and Industrial Management (CISIM ’15), Khalid Saeed and Wladyslaw
Homenda (Eds.). Lecture Notes in Computer Science, Vol. 9339. Springer, 58–69.
[181] Profium. 2018. Profium Sense. Available at https://www.profium.com/en/products/graph-database/. (2018).
[182] Roshan Punnoose, Adina Crainiceanu, and David Rapp. 2012. Rya: A Scalable RDF Triple Store for the Clouds. In
Proceedings of the 1st International Workshop on Cloud Intelligence (Cloud-I ’12). ACM, Article 4, 8 pages.
[183] Ashish Rana. 2019. Detailed Introduction: Redis Modules, from Graphs to Machine Learning (Part 1). Available at
https://medium.com/@ashishrana160796/15ce9ff1949f. (2019).
[184] Michel Raynal. 2013. Using Semaphores to Solve the Readers-Writers Problem. In Concurrent Programming: Algorithms,
Principles, and Foundations. Springer, Chapter 3.2.4, 74–78.
[185] Redis Labs. 2009. Redis. Available at https://redis.io/. (2009).
[186] Redis Labs. 2018. RedisGraph. Available at https://redis.io/docs/stack/graph/. (2018).
[187] Christopher D. Rickett, Utz-Uwe Haus, James Maltby, and Kristyn J. Maschhoff. 2018. Loading and Querying a Trillion
RDF triples with Cray Graph Engine on the Cray XC. In Proceedings of the Cray User Group (CUG ’18).
[188] Ian Robinson, Jim Webber, and Emil Eifrem. 2015. Graph Database Internals. In Graph Databases (2nd ed.). O’Reilly,
Chapter 7, 149–170.
[189] Marko A. Rodriguez. 2015. The Gremlin Graph Traversal Machine and Language. In Proceedings of the 15th Symposium
on Database Programming Languages (DBPL ’15). ACM, 1–10.
[190] Shahin Roozkhosh, Denis Hoornaert, Ju Hyoung Mun, Tarikul Islam Papon, Ahmed Sanaullah, Ulrich Drepper, Renato
Mancuso, and Manos Athanassoulis. 2021. Relational Memory: Native In-Memory Accesses on Rows and Columns.
(2021). arXiv:2109.14349
[191] Nicholas P. Roth, Vasileios Trigonakis, Sungpack Hong, Hassan Chafi, Anthony Potter, Boris Motik, and Ian Horrocks.
2017. PGX.D/Async: A Scalable Distributed Graph Pattern Matching Engine. In Proceedings of the Fifth International
Workshop on Graph Data-Management Experiences & Systems (GRADES ’17). ACM, Article 7, 6 pages.
[192] Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out Graph
Processing from Secondary Storage. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP ’15).
ACM, 410–424.
[193] Michael Rudolf, Marcus Paradies, Christof Bornhövd, and Wolfgang Lehner. 2013. The Graph Story of the SAP HANA
Database. In Datenbanksysteme für Business, Technologie und Web, Volker Markl, Gunter Saake, Kai-Uwe Sattler, Gregor
Hackenbroich, Bernhard Mitschang, Theo Härder, and Veit Köppen (Eds.). Gesellschaft für Informatik e.V., 403–420.
[194] Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M. Tamer Özsu. 2017. The Ubiquity of Large
Graphs and Surprising Challenges of Graph Processing. Proc. VLDB Endow. 11, 4 (Dec. 2017), 420–431.
[195] SAP. 2010. SAP HANA. Available at https://www.sap.com/products/technology-platform/hana.html. (2010).
[196] Y. Sasaki, G. Fletcher, and O. Makoto. 2022. Language-aware Indexing for Conjunctive Path Queries. In Proceedings of
the 38th International Conference on Data Engineering (ICDE ’22). IEEE, 661–673.
[197] Satu Elisa Schaeffer. 2007. Survey: Graph Clustering. Computer Science Review 1, 1 (Aug. 2007), 27–64.
[198] Patrick Schmid, Maciej Besta, and Torsten Hoefler. 2016. High-Performance Distributed RMA Locks. In Proceedings of
the 25th International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’16). ACM, 19–30.
[199] Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A Distributed Graph Engine on a Memory Cloud. In Proceedings
of the International Conference on Management of Data (SIGMOD ’13). ACM, 505–516.

44
[200] Sanjay Sharma. 2014. Cassandra Design Patterns. Packt Publishing.
[201] Xuanhua Shi, Zhigao Zheng, Yongluan Zhou, Hai Jin, Ligang He, Bo Liu, and Qiang-Sheng Hua. 2018. Graph
Processing on GPUs: A Survey. ACM Comput. Surv. 50, 6, Article 81 (Jan. 2018), 35 pages.
[202] Komal Singh and Vikram Singh. 2016. Graph pattern matching: A brief survey of challenges and research directions.
In Proceedings of the 3rd International Conference on Computing for Sustainable Global Development (INDIACom ’16).
199–204.
[203] solid IT GmbH. 2023. System Properties Comparison: Neo4j vs. Redis. Available at https://db-engines.com/en/system/
Neo4j%3BRedis. (2023).
[204] Edgar Solomonik, Maciej Besta, Flavio Vella, and Torsten Hoefler. 2017. Scaling Betweenness Centrality Using
Communication-Efficient Sparse Matrix Multiplication. In Proceedings of the International Conference for High Perfor-
mance Computing, Networking, Storage and Analysis (SC ’17). ACM, Article 47, 14 pages.
[205] Edgar Solomonik, Erin Carson, Nicholas Knight, and James Demmel. 2014. Tradeoffs Between Synchronization,
Communication, and Work in Parallel Linear Algebra Computations. Technical Report.
[206] Edgar Solomonik and Torsten Hoefler. 2015. Sparse Tensor Algebra as a Parallel Programming Model. (2015).
arXiv:1512.00066
[207] Linghao Song, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2018. GraphR: Accelerating Graph Processing
Using ReRAM. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA ’18).
IEEE, 531–543.
[208] Stardog Union. 2018. Stardog. Available at https://www.stardog.com/. (2018).
[209] Benjamin A. Steer, Alhamza Alnaimi, Marco A. B. F. G. Lotz, Felix Cuadrado, Luis M. Vaquero, and Joan Varvenne.
2017. Cytosm: Declarative Property Graph Queries Without Data Migration. In Proceedings of the Fifth International
Workshop on Graph Data-Management Experiences & Systems (GRADES ’17). ACM, Article 4, 6 pages.
[210] Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, and Guotong Xie. 2015. SQLGraph:
An Efficient Relational-Based Property Graph Store. In Proceedings of the International Conference on Management of
Data (SIGMOD ’15). ACM, 1887–1901.
[211] Narayanan Sundaram, Nadathur Satish, Md Mostofa Ali Patwary, Subramanya R Dulloor, Michael J Anderson,
Satya Gautam Vadlamudi, Dipankar Das, and Pradeep Dubey. 2015. GraphMat: High Performance Graph Analytics
Made Productive. Proc. VLDB Endow. 8, 11 (July 2015), 1214–1225.
[212] Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton, Marcus Paradies, Moritz Kaufmann, Orri Erling,
Peter Boncz, Vlad Haprian, and János Benjamin Antal. 2018. An Early Look at the LDBC Social Network Benchmark’s
Business Intelligence Workload. In Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data
Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) (GRADES-NDA ’18). Article 9,
11 pages.
[213] Gábor Szárnyas, Jack Waudby, Benjamin A. Steer, Dávid Szakállas, Altan Birler, Mingxi Wu, Yuchen Zhang, and Peter
Boncz. 2022. The LDBC Social Network Benchmark: Business Intelligence Workload. Proc. VLDB Endow. 16, 4 (Dec.
2022), 877–890.
[214] Ruben Taelman. 2022. RDF String Turtle. https://github.com/rubensworks/rdf-string-ttl.js. (2022).
[215] Adrian Tate, Amir Kamil, Anshu Dubey, Armin Größlinger, Brad Chamberlain, Brice Goglin, H. Edwards, Chris
Newburn, David Padua, Didem Unat, Emmanuel Jeannot, Frank Hannig, Tobias Gysi, Hatem Ltaief, James Sexton,
Jesús Labarta, John Shalf, Karl Fürlinger, Kathryn O’Brien, and Vitus Leung. 2014. Programming Abstractions for Data
Locality. In PADAL Workshop.
[216] Daniel ten Wolde, Tavneet Singh, Gábor Szárnyas, and Peter Boncz. 2023. DuckPGQ: Efficient Property Graph Queries
in an analytical RDBMS. In Proceedings of the Conference on Innovative Data Systems Research (CIDR ’23).
[217] Yuanyuan Tian, En Liang Xu, Wei Zhao, Mir Hamid Pirahesh, Sui Jun Tong, Wen Sun, Thomas Kolanko, Md.
Shahidul Haque Apu, and Huijuan Peng. 2020. IBM Db2 Graph: Supporting Synergistic and Retrofittable Graph Queries
Inside IBM Db2. In Proceedings of the International Conference on Management of Data (SIGMOD ’20). ACM, 345–359.
[218] TigerGraph Inc. 2018. TigerGraph. Available at https://www.tigergraph.com/. (2018).
[219] Twitter. 2010. FlockDB. Available at https://github.com/twitter-archive/flockdb. (2010).
[220] Aparna Vaikuntam and Vinodh Kumar Perumal. 2014. Evaluation of Contemporary Graph Databases. In Proceedings
of the 7th India Computing Conference (COMPUTE ’14). ACM, Article 6, 10 pages.
[221] Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. 2016. PGQL: A Property Graph Query
Language. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems
(GRADES ’16). ACM, Article 7, 6 pages.
[222] VelocityDB Inc. 2019. VelocityDB. Available at https://velocitydb.com/. (2019).
[223] VelocityDB Inc. 2019. VelocityGraph. Available at https://velocitydb.com/QuickStartVelocityGraph. (2019).
[224] WhiteDB Team. 2013. WhiteDB. Available at http://whitedb.org/ or https://github.com/priitj/whitedb. (2013).

45
[225] Konstantinos Xirogiannopoulos, Virinchi Srinivas, and Amol Deshpande. 2017. GraphGen: Adaptive Graph Processing
Using Relational Databases. In Proceedings of the Fifth International Workshop on Graph Data-Management Experiences
& Systems (GRADES ’17). ACM, Article 9, 7 pages.
[226] Da Yan, Yingyi Bu, Yuanyuan Tian, Amol Deshpande, and James Cheng. 2016. Big Graph Analytics Systems. In
Proceedings of the International Conference on Management of Data (SIGMOD ’16). ACM, 2241–2243.
[227] Robert Yokota. 2016. HGraphDB. Available at https://github.com/rayokota/hgraphdb. (2016).
[228] Pingpeng Yuan, Pu Liu, Buwen Wu, Hai Jin, Wenya Zhang, and Ling Liu. 2013. TripleBit: A Fast and Compact System
for Large Scale RDF Data. Proc. VLDB Endow. 6, 7 (May 2013), 517–528.
[229] Kangfei Zhao and Jeffrey Xu Yu. 2017. All-in-One: Graph Processing in RDBMSs Revisited. In Proceedings of the
International Conference on Management of Data (SIGMOD ’17). ACM, 1165–1180.
[230] Xiaowei Zhu, Guanyu Feng, Marco Serafini, Xiaosong Ma, Jiping Yu, Lei Xie, Ashraf Aboulnaga, and Wenguang Chen.
2020. LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans. Proc. VLDB
Endow. 13, 7 (March 2020), 1020–1034.
[231] Lei Zou, M. Tamer Özsu, Lei Chen, Xuchuan Shen, Ruizhe Huang, and Dongyan Zhao. 2014. GStore: A Graph-Based
SPARQL Query Engine. The VLDB Journal 23, 4 (Aug. 2014), 565–590.

46

You might also like