Graph Databases: A Comprehensive Survey
Graph Databases: A Comprehensive Survey
Graph processing has become an important part of multiple areas of computer science, such as machine
learning, computational sciences, medical applications, social network analysis, and many others. Numerous
graphs such as web or social networks may contain up to trillions of edges. Often, these graphs are also
dynamic (their structure changes over time) and have domain-specific rich data associated with vertices and
edges. Graph database systems such as Neo4j enable storing, processing, and analyzing such large, evolving,
and rich datasets. Due to the sheer size of such datasets, combined with the irregular nature of graph processing,
these systems face unique design challenges. To facilitate the understanding of this emerging domain, we
present the first survey and taxonomy of graph database systems. We focus on identifying and analyzing
fundamental categories of these systems (e.g., triple stores, tuple stores, native graph database systems, or
object-oriented systems), the associated graph models (e.g., RDF or Labeled Property Graph), data organization
techniques (e.g., storing graph data in indexing structures or dividing data into records), and different aspects
of data distribution and query execution (e.g., support for sharding and ACID). 51 graph database systems
are presented and compared, including Neo4j, OrientDB, and Virtuoso. We outline graph database queries
and relationships with associated domains (NoSQL stores, graph streaming, and dynamic graph algorithms).
Finally, we describe research and engineering challenges to outline the future of graph databases.
CCS Concepts: • General and reference → Surveys and overviews; • Information systems → Data
management systems; Graph-based database models; Data structures; DBMS engine architectures;
Database query processing; Parallel and distributed DBMSs; Database design and models; Distributed
database transactions; • Theory of computation → Data modeling; Data structures and algorithms for data
management; Distributed algorithms; • Computer systems organization → Distributed architectures;
Additional Key Words and Phrases: Graphs, Graph Databases, NoSQL Stores, Graph Database Management
Systems, Graph Models, Data Layout, Graph Queries, Graph Transactions, Graph Representations, RDF,
Labeled Property Graph, Triple Stores, Key-Value Stores, RDBMS, Wide-Column Stores, Document Stores
1 INTRODUCTION
Graph processing is behind numerous problems in computing, for example in medicine, machine
learning, computational sciences, and others [122, 147]. Graph algorithms are inherently difficult
to design because of challenges such as large sizes of processed graphs, little locality, or irregular
communication [42, 61, 143, 147, 192, 215]. The difficulties are increased by the fact that many
such graphs are also dynamic (their structure changes over time) and have rich data, for example
arbitrary properties or labels, associated with vertices and edges.
Graph databases (GDBs) such as Neo4j [188] emerged to enable storing, processing, and analyzing
large, evolving, and rich graph datasets. Graph databases face unique challenges due to overall
properties of irregular graph computations combined with the demand for low latency and high
throughput of graph queries that can be both local (i.e., accessing or modifying a small part of the
graph, for example a single edge) and global (i.e., accessing or modifying a large part of the graph,
for example all the edges). Many of these challenges belong to the following areas: “general design”
(i.e., what is the most advantageous general structure of a graph database engine), “data models
and organization” (i.e., how to model and store the underlying graph dataset), “data distribution”
(i.e., whether and how to distribute the data across multiple servers), and “transactions and queries”
(i.e., how to query the underlying graph dataset to extract useful information). This distinction
is illustrated in Figure 1. In this work, we present the first survey and taxonomy on these system
aspects of graph databases.
Wide- Key-
RDBMS column value Document Query Data
for graphs stores stores stores (§ 6.5) execution organization
(§ 6.7) (§ 6.6) (§ 6.4) (§ 5.3) (§ 5.2)
Tuple
stores (§ 6.3)
Object- Storage
oriented Design details backend
databases Taxonomy and
of selected graph RDF stores key dimensions (§ 5.1)
(§ 6.8) databases (§ 6) (§ 6.2) (§ 5)
LPG graph
stores (§ 6.9)
...
graph databases in more
detail in
different
Graph databases
vs. other classes of
Non-graph
data models
surveys graph systems (§ 2) Data models (§ 3) (§ 3.4)
History of
...
graph databases
Integrity
constraints
Data models in graph
...
in graph databases
...
databases
This symbol indicates
Conceptual
Graph structure graph data
Query languages that a given category models (§ 3.3)
representation
...
in graph databases Graph
...
analytics
Graph
...
streaming
is surveyed in another
publication (§ 3.2)
2
• We discuss in detail the design of selected graph databases.
• We outline related domains, such as queries and workloads in graph databases.
• We discuss future challenges in the design of graph databases.
3
Types of database systems
No longer
actively
developed
RDF Native Wide-
RDF systems can systems graph column
be implemented stores stores
as NoSQL or as
traditional RDBM
systems
graph streaming frameworks focus on simple graph models where edges or vertices may have
weights and, in some cases, simple additional properties such as time stamps. Moreover, challenges
in the design of graph databases include transactional support, persistence, physical/logical data
independence, data integrity, or consistency; these topics are little related to graph streaming
frameworks.
4
𝐺 A graph 𝐺 = (𝑉 , 𝐸) where 𝑉 is a set of vertices and 𝐸 is a set of edges.
𝑛, 𝑚 The count of vertices and edges in a graph 𝐺; |𝑉 | = 𝑛, |𝐸| = 𝑚.
𝑑, 𝑑ˆ The average degree and the maximum degree in a given graph, respectively.
P (𝑆) = 2𝑆 The power set of 𝑆: a set that contains all possible subsets of 𝑆.
AM, M The Adjacency Matrix representation. M ∈ {0, 1}𝑛,𝑛 , M𝑢,𝑣 = 1 ⇔ (𝑢, 𝑣) ∈ 𝐸.
AL, 𝐴𝑢 The Adjacency List representation and the adjacency list of a vertex 𝑢; 𝑣 ∈ 𝐴𝑢 ⇔ (𝑢, 𝑣) ∈ 𝐸.
LPG, RDF Labeled Property Graph (§ 3.3.2) and Resource Description Framework (§ 3.3.4).
KV, RDBMS Key-Value store (§ 6.4) and Relational Database Management Systems (§ 6.7).
OODBMS Object-Oriented Database Management Systems (§ 6.8).
OLTP, OLAP Online Transaction Processing (§ 4.1) and Online Analytics Processing (§ 4.1).
ACID Transaction guarantees (Atomicity, Consistency, Isolation, Durability).
Table 1. The most relevant symbols and abbreviations used in this work.
5
INPUT GRAPH: Adjacency Matrix
An n x n matrix
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1
0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
...
0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1
0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0
0 1 1 0 0 1 0 0 1 1 0 0 0 1 0 1
0 0 1 1 0 1 1 0 0 1 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
n 0 0 0 0 0 1 1 1 1 0 1 1 1 1 1 0
n: number of vertices 1 ... n
m: number of edges Unweighted graph: a cell is one bit
d: maximum graph degree Weighted graph: a cell is one integer
8 structures,
...
6 12
...
6 12
...
9 6 7 1116
e.g., arrays 9
or lists
10 6 16 7 12
10 5 6 7 1112 7 9 6 9
11 2 3 6 9 10 14 16 11 7 10 6 10
...
12 3 4 6 7 1016 12 7 12 6 16
13 15 16 13 7 14 7 14
Neighborhoods 14
14 7 1116 can be sorted 7 16 2 11
15 13 16 or unsorted 15 2m 2m
...
...
n 16 16 15 or or 3 12
n 16 6 7 8 9 11 12 13 14 15 m m
1 ... d Pointers from
Pointers from vertices vertices to their Number 2m (undirected),
to their neighborhoods neighborhoods of tuples: m (directed)
Fig. 3. Illustration of fundamental graph representations: Adjacency Matrix, Adjacency List, and Edge List.
subset of labels. Next, a vertex as well as an edge can be associated with any number of properties.
We model a property as a key-value pair 𝑝 = (𝑘𝑒𝑦, 𝑣𝑎𝑙𝑢𝑒), where 𝑘𝑒𝑦 ∈ 𝐾 and 𝑣𝑎𝑙𝑢𝑒 ∈ 𝑊 . 𝐾 and 𝑊
are sets of all possible keys and values. Finally, 𝑝𝑉 (𝑢) denotes the set of property key-value pairs
of the vertex 𝑢, 𝑝 𝐸 (𝑒) denotes the set of property key-value pairs of the edge 𝑒. An example LPG is
in Figure 4. Note that, in LPGs, 𝐸 may be a multi-set (i.e., there may be more than a single edge
between vertices, even having identical labels and/or key-value sets. All systems considered in this
work use some variant of the LPG, with the exception of RDF systems or when explicitly discussed.
6
:Person :knows :Person
name = Alice since = 09.08.2007 name = Bob
age = 21 age = 24
:hasCreator :hasCreator
:Message :Message
:Post :Comment
title = Holidays :replyOf
text = We had... text = Wow! ...
Fig. 4. The illustration of an example Labeled Property Graph (LPG). Vertices and edges can have labels (bold,
prefixed with colon) and properties (key = value). We present a subgraph of a social network, where a person can know
other persons, post messages, and comment on others’ messages.
3.3.3 Variants of Labeled Property Graph Model. Several databases support variants of LPG. First,
Neo4j [188] (a graph database described in detail in § 6.9.1) supports an arbitrary number of labels
for vertices. However it only allows for one label, (called edge-type), per edge. Next, ArangoDB [18]
(a graph database described in detail in § 6.5.2) only allows for one label per vertex (vertex-type) and
one label per edge (edge-type). This facilitates the separation of vertices and edges into different
document collections. Moreover, edge-labeled graphs [8] do not allow for any properties and use
labels in a restricted way. Specifically, only edges have labels and each edge has exactly one label.
Formally, 𝐺 = (𝑉 , 𝐸, 𝐿), where 𝑉 is the set of vertices and 𝐸 ⊆ 𝑉 × 𝐿 × 𝑉 is the set of edges. Note
that this definition enables two vertices to be connected by multiple edges with different labels.
Finally, some effort was dedicated to LPG variants that facilitate storing historical graph data [57].
3.3.4 Resource Description Framework. The Resource Description Framework [66] is a collection of
specifications for representing information. It was introduced by the World Wide Web Consortium
(W3C) in 1999 and the latest version (1.1) of the RDF specification was published in 2014. Its goal is
to enable a simple format that allows for easy data exchange between different formats of data.
It is especially useful as a description of irregularly connected data. The core part of the RDF
model is a collection of triples. Each triple consists of a subject, a predicate, and an object. Thus, RDF
databases are also often called triple stores (or triplestores). Subjects can either be identifiers (called
Uniform Resource Identifiers (URIs)) or blank nodes (which are dummy identifiers for internal
use). Objects can be URIs, blank nodes, or literals (which are simple values). With triples, one can
connect identifiers with identifiers or identifiers with literals. The connections are named with
another URI (the predicate). RDF triples can be formally described as
(𝑠, 𝑝, 𝑜) ∈ (𝑈 𝑅𝐼 ∪ 𝑏𝑙𝑎𝑛𝑘) × (𝑈 𝑅𝐼 ) × (𝑈 𝑅𝐼 ∪ 𝑏𝑙𝑎𝑛𝑘 ∪ 𝑙𝑖𝑡𝑒𝑟𝑎𝑙)
𝑠 represents a subject, 𝑝 models a predicate, and 𝑜 represents an object. 𝑈 𝑅𝐼 is a set of Uniform
Resource Identifiers; 𝑏𝑙𝑎𝑛𝑘 is a set of blank node identifiers, that substitute internally URIs to allow
for more complex data structures; 𝑙𝑖𝑡𝑒𝑟𝑎𝑙 is a set of literal values [113, 173].
3.3.5 Transformations between LPG and RDF. To represent a Labeled Property Graph in the RDF
model, LPG vertices are mapped to URIs (❶) and then RDF triples are used to link those vertices with
their LPG properties by representing a property key and a property value with, respectively, an RDF
predicate and an RDF object (❷). For example, for a vertex with an ID vertex-id and a corresponding
property with a key property-key and a value property-value, one creates an RDF triple (vertex-id,
property-key, property-value). Similarly, one can represent edges from the LPG graph model in the
RDF model by giving each edge the URI status (❸), and by linking edge properties with specific
edges analogously to vertices: (edge-id, property-key, property-value) (❹). Then, one has to use two
7
triples to connect each edge to any of its adjacent vertices (❺). Finally, LPG labels can also be
transformed into RDF triples in a way similar to that of properties [27], by creating RDF triples for
vertices (❻) and edges (❼) such that the predicate becomes a “label” URI and contains the string
name of this label. Figure 5 shows an example of transforming an LPG graph into RDF triples. More
details on transformations between LPG and RDF are provided by Hartig [111].
2
:Person Alice 09.08.2007 Bob
name = Alice 21 24
age = 21 4 name
age name since 3 age
:knows 5
since = 09.08.2007 V-ID from E-ID to V-ID
type
1 label type
5 edge
:Person type 7
name = Bob knows vertex
age = 24 vertex label label
LPG graph 6 Person RDF graph
Fig. 5. Comparison of an LPG and an RDF graph: a transformation from LPG to RDF. “V-ID”, “E-ID”, “age”, “name”,
“type”, “from”, “to”, “since” and “label” are RDF URIs. Numbers in black circles refer to transformation steps in § 3.3.5.
If all vertices and edges only have one label, one can omit the triples for labels and store the label
(e.g., “Person”) together with the vertex or the edge name (“V-ID” and “E-ID”) in the identifier. We
illustrate a corresponding example in Figure 6.
:Person
name = Alice Alice 09.08.2007 Bob
age = 21 21 24
name name
:knows age since age
since = 09.08.2007
Person/V-ID knows/E-ID to
Person/V-ID
from
:Person type type type
name = Bob
age = 24
vertex edge vertex
LPG graph RDF graph
Fig. 6. Comparison of an LPG and an RDF graph: a transformation from LPG to RDF, given vertices and edges have
only one label. “Person/V-ID”, “knows/E-ID”, “age”, “name”, “type”, “from”, “to” and “since” are RDF URIs.
Transforming RDF data into the LPG model is more complex, since RDF predicates, which would
normally be translated into edges, are URIs. Thus, while deriving an LPG graph from an RDF graph,
one must map edges to vertices and link such vertices, otherwise the resulting LPG graph may
be disconnected. There are several schemes for such an RDF to LPG transformation, for example
deriving an LPG graph which is bipartite, at the cost of an increased graph size [113]. Details and
examples are provided in a report by Hayes [113].
3.4 Non-Graph Data Models and Storage Schemes Used in Graph Databases
In addition to the conceptual graph models, graph databases also often incorporate different storage
schemes and data models that do not target specifically graphs but are used in various systems
to model and store graphs. These models include collections of key-value pairs, documents, and
tuples (used in different types of NoSQL stores), relations and tables (used in traditional relational
databases), and objects (used in object-oriented databases). Different details of these models and the
8
database systems based on them are described in other surveys, for example in a recent publication
on NoSQL stores by Davoudian et al. [70]. Thus, we omit extensive discussions and instead offer
brief summaries, focusing on how they are used to model or represent graphs.
3.4.1 Collection of Key-Value Pairs. Key-value stores are the simplest NoSQL stores [70]. Here,
the data is stored as a collection of key-value pairs, with the focus on high-performance and highly-
scalable lookups based on keys. The exact form of both keys and values depends on a specific
system or an application. Keys can be simple (e.g., an URI or a hash) or structured. Values are often
encoded as byte arrays (i.e., the structure of values is usually schema-less). However, a key-value
store can also impose some additional data layout, structuring the schema-less values [70].
Due to the general nature of key-value stores, there can be many ways of representing a graph as
a collection of KV values. We describe several concrete example systems [72, 120, 186, 199] in § 6.4.
For example, one can use vertex labels as keys and encode the neighborhoods of vertices as values.
3.4.2 Collection of Documents. A document is a fundamental storage unit in a class of NoSQL
databases called document stores [70]. These documents are stored in collections. Multiple col-
lections of documents constitute a database. A document is encoded using a selected standard
semi-structured format, e.g., JSON [50] or XML [51]. Document stores extend key-value stores in
that a document can be seen as a value that has a certain flexible schema. This schema consists of
attributes, where each attribute has a name along with one or more values. Such a structure based
on documents with attributes allows for various value types, key-value pair storage, and recursive
data storage (attribute values can be lists or key-value dictionaries).
In all surveyed document stores [18, 53, 88, 141, 161] (§ 6.5), each vertex is stored in a vertex
document. The capability of documents to store key-value pairs is used to store vertex labels and
properties within the corresponding vertex document. The details of edge storage, however, are
system-dependent: edges can be stored in the document corresponding to the source vertex of each
edge, or in the documents of the destination vertices. As documents do not impose any restriction
on what key-value pairs can be stored, vertices and edges may have different sets of properties.
3.4.3 Collection of Tuples. Tuples are a basis of NoSQL stores called tuple stores. A tuple store
generalizes an RDF store: RDF stores are restricted to triples (or – in some cases – 4-tuples, also
referred to as quads) whereas tuple stores can contain tuples of an arbitrary size. Thus, the number
of elements in a tuple is not fixed and can vary, even within a single database. Each tuple has an ID
which may also be a direct memory pointer.
A collection of tuples can model a graph in different ways. For example, one tuple of size 𝑛 can
store pointers to other tuples that contain neighborhoods of vertices. The exact mapping between
such tuples and graph data is specific to different databases; we describe an example [224] in § 6.3.
3.4.4 Collection of Tables. Tables are the basis of Relational Database Management Systems
(RDBMS) [22, 64, 114]. Tables consist of rows and columns. Each row represents a single data
element, for example a car. A single column usually defines a certain data attribute, for example the
color of a car. Some columns can define unique IDs of data elements, called primary keys. Primary
keys can be used to implement relations between data elements. A one-to-one or a one-to-many
relation can be implemented with a single additional column that contains the copy of a primary
key of the related data element (such primary key copy is called the foreign key). A many-to-many
relation can be implemented with a dedicated table containing foreign keys of related data elements.
To model a graph as a collection of tables, one can implement vertices and edges as rows in two
separate tables. Each vertex has a unique primary key that constitutes its ID. Edges can relate to
their source or destination vertices by referring to their primary keys (as foreign keys). LPG labels
9
and properties, as well as RDF predicates, can be modeled with additional columns [225, 229]. We
present and analyze different graph database systems [23, 171] based on tables in § 6.6 and § 6.7.
3.4.5 Collection of Objects. One can also use collections of objects in Object-Oriented Database
Management Systems (OODBMS) [21] to model graphs. Here, data elements and their relations
are implemented as objects linked with some form of pointers. The details of modeling graphs as
objects heavily depend on specific designs. We provide details for an example system [223] in § 6.8.
10
OLTP OLAP
Single vertices and edges Subgraphs, Paths, Patterns Whole graph
Property E
P P E V V E
P P E V V
P P P V V
E E
V
E
V E E V E E
V E V
Vertex Edge Label V E V
L L L E
L V V V
Fig. 7. Illustration of different query scopes and their relation to other graph query taxonomy aspects, in the context of
accessing a Labeled Property Graph.
part. We call the initial vertex or the set of vertices the anchor or root of the traversal. Queries can
restrict what edges or vertices can be retrieved or traversed. As this is a common graph database
task, this query is also used in different performance benchmarks [62, 76, 124].
Global Graph Analytics Finally, we identify global graph analytics queries, which by definition
consider the whole graph (not necessarily every property but all vertices and edges). Different
benchmarks [30, 56, 76, 154] take these large-scale queries into account since they are used in
different fields such as threat detection [79] or computational chemistry [25]. As indicated in
Tables 2 and 3, many graph databases support such queries. Graph processing systems such as
Pregel [148] or Giraph [15] focus specifically on resolving global analytics [107]. Example queries
include resolving global pattern matching [60, 202], shortest paths [74], max-flow or min-cut [67],
minimum spanning trees [134], diameter, eccentricity, connected components, PageRank [175],
and many others. Some traversals can also be global (e.g., finding all shortest paths of unrestricted
length), thus falling into the category of global analytics queries. Global analytics workloads have
been a subject of numerous research efforts in the last decade [29, 37, 38, 126, 155, 178, 204]
4.2.2 Classes of Graph Workloads. We also outline an existing taxonomy of graph database
workloads that is provided as a part of the LDBC benchmarks [9]. LDBC is an effort by academia
and industry to establish a set of standard benchmarks for measuring the performance of graph
databases. The effort currently specifies interactive workloads, Business Intelligence workloads, and
graph analytics workloads.
Interactive Workloads A part of LDBC called the Social Network Benchmark (SNB) [85]
identifies and analyzes interactive workloads that can collectively be described as either read-only
queries or simple transactional updates. They are divided into three further categories. First, short
read-only queries start with a single graph element (e.g., a vertex) and lookup its neighbors or
conduct small traversals. Second, complex read-only queries traverse larger parts of the graph; they
are used in the LDBC benchmark to not just assess the efficiency of the data retrieval process but
also the quality of query optimizers. Finally, transactional update queries insert, modify, remove
either a single element (e.g., a vertex), possibly together with its adjacent edges, or a single edge.
This workload tests common graph database operations such as the lookup of a friend profile in a
social network, or friendship removal.
11
Business Intelligence Workloads Next, LDBC identifies Business Intelligence (BI) workloads [212],
which fetch large data volumes, spanning large parts of a graph. Contrarily to the interactive work-
loads, the BI workloads heavily use summarization and aggregation operations such as sorting,
counting, or deriving minimum, maximum, and average values. They are read-only. The LDBC spec-
ification provides an extensive list of BI workloads that were selected so that different performance
aspects of a database are properly stressed when benchmarking.
Graph Analytics Workloads Finally, the LDBC effort comes with a graph analytics bench-
mark [121], where six graph algorithms are proposed as a standard benchmark for a graph analytics
part of a graph database. These algorithms are Breadth-First Search, PageRank, weakly connected
components [99], community detection using label propagation [47], deriving the local clustering
coefficient [197], and computing single-source shortest paths [74].
LDBC Workloads The LDBC interactive workloads correspond to local, neighborhood, and
traversals. The LDBC Business Intelligence workloads range from traversals to global graph analytics
queries. The LDBC graph analytics benchmark corresponds to global graph analytics.
4.2.3 Graph Patterns and Navigational Expressions. Angles et al. [8] inspected in detail the theory
of graph queries. In one identified family of graph queries, called simple graph pattern matching,
one prescribes a graph pattern (e.g., a specification of a class of subgraphs) that is then matched to
the graph maintained by the database, searching for the occurrences of this pattern. This query
can be extended with aggregation and a projection function to so called complex graph pattern
matching. Next, path queries allow to search for paths of arbitrary distances in the graph. One can
also combine complex graph pattern matching and path queries, resulting in navigational graph
pattern matching, in which a graph pattern can be applied recursively on the parts of the path.
4.2.4 Input Loading. Finally, certain benchmarks also analyze bulk input loading [62, 76, 124].
Specifically, given an input dataset, they measure the time to load this dataset into a database. This
scenario is common when data is migrated between systems.
12
Column RDBMS Row RDBMS
Vertices and edges are stored in Vertices and edges are stored in
rows of two column-oriented tables rows of two row-oriented tables
Object-Oriented DBMS Table with Table with Table with Table with
vertices edges vertices edges
Vertices and edges are stored
in Java, C#, ... language objects
Data Hub
RDBMS
Native (Triple) Graph Store
DB
+
S
System Backends
What is the general • Native (triple) store • Native (LPG) store • Tuple store • Document store
type of a database • Key-value store • Wide-column store • RDBMS • OODBMS
storage backend?
Data Organization
Representations Graph Models
Optimizations What representations • AM What conceptual • RDF triples
of graphs are used? • AL models of graph • LPG
What common opti- data are supported?
mizations are used?
Data Distribution
Can the system run in Data Sharding Data Replication
a distributed mode? • Yes
Is data sharding • Yes
• No Is data replication • No
• Yes • No supported? supported?
Indexes
Design Purpose • Data indexes (internal)
• Tree • Data indexes (external)
How are indexes • Hashtable What are • Neighborhood indexes
implemented? • Skip list indexes • Structural indexes
used for?
Query Execution
14
5.2 Data Organization
In data organization, we distinguish the following taxonomy aspects: (1) graph structure represen-
tation, (2) conceptual data models, (3) indexes, (4) data distribution, and (5) common optimizations.
First, in graph structure representation, we analyze whether the graph structure is stored
using the AL or the AM representation (see § 3.2). The graph structure representation directly
impacts the performance of queries.
Second, we investigate what conceptual data models are supported by different graph databases.
Here, we focus on the RDF and LPG models as well as their variants, described in § 3.3. The used
graph model strongly influences what graph query languages can be used together with a given
system, and it also has impact on the associated data layout.
We also analyze how graph databases use indexes to accelerate accessing data. Indexes can
significantly improve the performance of GDBs, and they are widely used, for example in RDF
systems [2, 131, 182]. Here, we consider the functionality (i.e., the use case) of a given index, and
how a given index is implemented 2 . As for the former, we identify four different index use cases:
storing the locations of vertex neighborhoods (referred to as “neighborhood indexes”), indexing
graph elements, such as vertices, that satisfy pre-specified conditions related to rich graph data
(referred to as “data indexes”), storing the actual graph data, and maintaining non-graph related
data (referred to as “structural indexes”). As for the latter, we identify three fundamental data
structures used to implemented indexes: trees, skip lists, and hashtables.
We also identify whether a database can run in a distributed mode. A system is distributed or
multi-server if it can run on multiple servers (also called compute nodes) connected with a network.
In such systems, data may be replicated [94] (maintaining copies of the dataset at each server), or it
may allow for sharding [84] (data fragmentation, i.e., storing only a part of the given dataset on
one server). Replication often allows for more fault tolerance [83], sharding reduces the amount of
used memory per node and can improve performance [83]. Gathering this information facilitates
selecting a system with the most appropriate performance properties in a given context. For
example, systems that replicate but not shard the data, may offer more performance for read-only
workloads, but may scale badly for particularly large graphs that would require disk spilling.
Finally, we identify common optimizations: (1) dividing data into records, (2) lightweight
edges, and (3) linking records with direct pointers.
Dividing Data into Records Graph databases usually organize data into small units called
records. One record contains information about a certain single entity (e.g., a person), this informa-
tion is organized into specified logical fields (a name, a surname, etc.). A certain number of records
is often kept together in one contiguous block in memory or disk to enhance data access locality.
Enabling Lightweight Edges Some systems (e.g., OrientDB) allow edges without labels or
properties to be stored as lightweight edges. Such edges are stored in the records of the corresponding
source and/or destination vertices. These lightweight edges are represented by the ID of their
destination vertex, or by a pointer to this vertex. This can save storage space and accelerate resolving
different graph queries such as verifying connectivity of two vertices [54].
Linking Records with Direct Pointers In record based systems, vertices and edges are stored
in records. To enable efficient resolution of connectivity queries (i.e., verifying whether two vertices
are connected), these records have to point to other records. One option is to store direct pointers
(i.e., memory addresses) to the respective connected records. For example, an edge record can store
direct pointers to vertex records with adjacent vertices. Another option is to assign each record
a unique ID and use these IDs instead of direct pointers to refer to other records. On one hand,
2We do not include the index information in Tables 2–3 because of lack of space, and instead provide a detailed separate
analysis in § 7.2.7.
15
this requires an additional indexing structure to find the physical location of a record based on its
ID. On the other hand, if the physical location changes, it is usually easier to update the indexing
structure instead of changing all associated direct pointers.
Note that specific systems can employ other diverse optimizations. For example, in addition to
using index structures to maintain the locations of data, some databases also store the graph data
in the indexes themselves. In such cases, the index does not point to a certain data record but the
index itself contains the desired data. Example systems with such functionality are Sparksee/DEX
and Cray Graph Engine. To maintain indices, the former uses bitmaps and B+ trees while the latter
uses hashtables.
16
Graph Database Model Repr. Data Organization Data Distribution & Query Execution
oB Additional remarks
System
lpg rdf al am fs vs dp se sv lw ms rp sh ce pe tr oltp olap
NATIVE GRAPH DATABASES (RDF model based, triple stores) (§ 6.2). The main data model used: RDF triples (§ 3.3.4).
AllegroGraph [92] é é é é ∗é é é é é é ? ∗ Triples are stored as integers (RDF strings
map to integers).
BlazeGraph [44] é ∗ ∗é é ? ? é é é é ? ? ? ? ∗ BlazeGraph uses RDF*, an extension of RDF
(details in § 6.2.2).
Cray Graph Engine [187] é é é é é∗ é∗ é é é é é é é é ∗ RDF triples are stored in hashtables.
Amazon Neptune [5] é é é ? ? é é é é é é —
AnzoGraph [55] é é é ? ? é é é é é —
Apache Jena TBD [17] é é ? é ? ? ? ? ? ? é ? é é ? —
Apache Marmotta [16] é é é é ∗é é é é é ? ? ? ∗ The structure of data records is based on
that of different RDBMS systems
(H2 [163], PostgreSQL [162], MySQL [78]).
BrightstarDB [164] é é é é ? ? é é é é ? ? ? ? ? —
gStore [231] é é é é é é é ? ? é ? é ? ? ? —
Ontotext GraphDB [169] é é é é ? ? é é é é é ? ? —
Profium Sense [181] é é ∗é é ? ? é é é é ? ? ? ∗ The format used is called JSON-LD:
JSON for vertices and RDF for edges.
TripleBit [228] é é é é é ∗ é é é é é‡ é é é é ? ? The data organization uses compression.
∗ Strings map to variable size integers.
‡ Described as future work.
NATIVE GRAPH DATABASES (LPG model based) (§ 6.9). The main data model used: LPG (§ 3.3.2, § 3.3.3).
Neo4j [188] é é é é é é é é Neo4j is provided as a cloud service by a
system called Graph Story [102].
Sparksee/DEX [151] é é é∗ é é‡ é‡ é é é é é ∗ Bitmaps are used for connectivity.
‡ The system uses maps only.
GBase [127] é é∗ é é‡ é é é é é é ? ? ? ? ? é ∗ GBase supports simple graphs only (§ 3.1).
‡ GBase stores the AM sparsely.
GraphBase [86] é é∗ é ? é é ? ? ? ? ? ? ? ∗ No support for edge properties, only two
types of edges available.
Graphflow [128] é é é ? ? ? ? ? ? é ? ? ? ? ? ? —
LiveGraph [230] é é é é é é é é ? é ? —
Memgraph [156] é é é ? ? ? ? ? ? ∗ ‡ ∗ This feature is under development.
‡ Available only for some algorithms.
TigerGraph [218] é é ? é ? ? ? ? ? ? —
Weaver [77] é é ? é ? ? ? ? ? ? —
KEY-VALUE STORES (§ 6.4). The main data model used: key-value pairs (§ 3.4.1).
DOCUMENT STORES (§ 6.5). The main data model used: documents (§ 3.4.2).
ArangoDB [18] é é∗ é é é é é é ∗ Uses a hybrid index for retrieving edges.
OrientDB [53] é ∗ é é é ‡ é ∗ AL contains RIDs (i.e., physical locations)
of edge and vertex records. ‡ Sharding is
user defined. OrientDB supports JSON and
it offers certain object-oriented capabilities.
Azure Cosmos DB [161] é é é é é é é é ? —
Bitsy [141] é é é é é é é é é é é é é The system is disk based and uses JSON files.
The storage only allows for appending data.
FaunaDB [88] ∗ é ‡é é é é é é ∗ Document, RDBMS, graph, “time series”.
‡ Adjacency lists are separately precomputed.
Table 2. Comparison of graph databases (native graph databases based on RDF and LPG, key-value stores, and
document stores). Bolded systems are described in more detail in the corresponding sections. oB: A system supports
secondary data models / backend types (in addition to its primary one). lpg, rdf: A system supports, respectively, the
Labeled Property Graph and RDF without prior data transformation. am, al: The structure is represented as the
adjacency matrix or the adjacency list. fs, vs: Data records are fixed size and variable size, respectively. dp: A system
can use direct pointers to link records. This enables storing and traversing adjacency data without maintaining indices.
se: Edges can be stored in a separate edge record. sv: Edges can be stored in a vertex record. lw: Edges can be
lightweight (containing just a vertex ID or a pointer, both stored in a vertex record). ms: A system can operate in a
Multi-Server (distributed) mode. rp: Given a distributed mode, a system enables Replication of datasets. sh: Given a
distributed mode, a system enables Sharding of datasets. ce: Given a distributed mode, a system enables Concurrent
Execution of multiple queries. pe: Given a distributed mode, a system enables Parallel Execution of single queries on
multiple nodes/CPUs. tr: Support for ACID Transactions. oltp: Support for Online Transaction Processing. olap:
Support for Online Analytical Processing. : A system offers a given feature. : A system offers a given feature in a
limited way. é: A system does not offer a given feature. ?: Unknown.
17
Graph Database oB Model Repr. Data Organization Data Distribution & Query Execution
Additional remarks
System lpg rdf al am fs vs dp se sv lw ms rp sh ce pe tr oltp olap
RELATIONAL DBMS (RDBMS) (§ 6.7). The main data model used: tables (implementing relations) (§ 3.4.4).
Oracle Spatial ∗ é é ?∗ ?∗ é ∗ é é ∗ LPG and RDF use row-oriented storage.
and Graph [171] The system can also run on top of PGX [117]
(effectively as a native graph database).
AgensGraph [43] é é é ? ? é ? é é AgensGraph is based on PostgreSQL.
FlockDB [219] é é é é é ? ? é ? é é é é é The system focuses on “shallow” graph
queries, such as finding mutual friends.
IBM Db2 é é é é ? ? é ∗ ∗ ‡ ‡ ‡ ‡ ‡ ‡ ? ∗ can store vertices/edges in the same table.
Graph [217] ‡ inherited from the underlying IBM Db2™.
MS SQL Server é é é ? ? é ? é é The system uses an SQL graph extension.
2017 [160]
OQGRAPH [149] é é é é ?∗ ?∗ é ∗ é é é ? OQGRAPH uses MariaDB [28].
∗ OQGRAPH uses row-oriented storage.
SAP HANA [195] é é é é∗ é∗ é é∗ é é ∗ SAP HANA is column-oriented, edges and
vertices are stored in rows. SAP HANA can
be used with a dedicated graph engine [193];
it offers some capabilities of a JSON document
store [195]
SQLGraph [210] é é é é ? ∗é ‡ é † † † † † † ? ∗ SQLGraph uses JSON for property storage.
‡ SQLGraph uses row-oriented storage.
† depends on the used SQL engine.
WIDE-COLUMN STORES (§ 6.6). The main data model used: key-value pairs and tables (§ 3.4.1, § 3.4.4).
JanusGraph [23] é é é é é é é JanusGraph is the continuation of Titan.
Titan [23] é é é é é é Enables various backends (e.g.,
Cassandra [140]).
DSE Graph é é é é é é é ? ∗ DSE Graph is based on Cassandra [140].
(DataStax) [68] ∗ Support for AID, Consistency is configurable.
HGraphDB [227] é é é é é é é é ? ? é∗ HGraphDB uses TinkerPop3 with HBase [97].
∗ ACID is supported only within a row.
TUPLE STORES (§ 6.3). The main data model used: tuples (§ 3.4.3).
OBJECT-ORIENTED DATABASES (OODBMS) (§ 6.8). The main data model used: objects (§ 3.4.5).
Velocity- é é é é é The system is based on VelocityDB [222]
Graph [223]
Objectivity é é ? ? ? ? ? ? ? The system is based on ObjectivityDB [105].
ThingSpan [167]
DATA HUBS (§ 6.10). The main data model used: several different ones.
MarkLogic [150] é∗ é é ? é ∗ é é ? Supported storage/models: relational tables,
RDF, various documents. ∗ Vertices are stored
as documents, edges are stored as RDF triples.
OpenLink é é é ? ? é é é é ∗ Supported storage/models: relational tables
Virtuoso [170] and RDF triples. ∗ This feature can be used
relational data only.
Cayley [58] ? ? ? ? ? ? ? é ? ∗ ? ? Supported storage/models: relational tables,
RDF, document, key-value. ∗ This feature
depends on the backend.
InfoGrid [119] é ? ? ? ? é é ? ∗ ? ? Supported storage/models: relational tables,
Hadoop’s filesystem, grid storage. ∗ A weaker
consistency model is used instead of ACID.
Stardog [208] ∗ ∗é é ? é ∗ é é é ? Supported storage/models: relational tables,
documents. ∗ RDF is simulated on relational
tables. Both LPG and RDF are enabled
through virtual quints.
Table 3. Comparison of graph databases (RDBMS, wide-column stores, tuple stores, OODBMS, data hubs).
Bolded systems are described in more detail in the corresponding sections. oB: A system supports secondary data models /
backend types (in addition to its primary one). lpg, rdf: A system supports, respectively, the Labeled Property Graph and
RDF without prior data transformation. am, al: The structure is represented as the adjacency matrix or the adjacency
list. fs, vs: Data records are fixed size and variable size, respectively. dp: A system can use direct pointers to link
records. This enables storing and traversing adjacency data without maintaining indices. se: Edges can be stored in a
separate edge record. sv: Edges can be stored in a vertex record. lw: Edges can be lightweight (containing just a vertex
ID or a pointer, both stored in a vertex record). ms: A system can operate in a Multi Server (distributed) mode. rp: Given a
distributed mode, a system enables Replication of datasets. sh: Given a distributed mode, a system enables Sharding
of datasets. ce: Given a distributed mode, a system enables Concurrent Execution of multiple queries. pe: Given a
distributed mode, a system enables Parallel Execution of single queries on multiple nodes/CPUs. tr: Support for ACID
Transactions. oltp: Support for Online Transaction Processing. olap: Support for Online Analytical Processing. :
A system offers a given feature. : A system offers a given feature in a limited way. é: A system does not offer a given
feature. ?: Unknown.
18
Graph Database System Graph Database Query Language Other languages and additional remarks
SPARQL Gremlin Cypher SQL GraphQL Progr. API
NATIVE GRAPH DATABASES (RDF model based, triple stores) (§ 6.2).
AllegroGraph é é é é é é
Amazon Neptune é é é é é
AnzoGraph é é é é é
Apache Jena TDB é é é é (Java) é
Apache Marmotta é é é é é Apache Marmotta also supports its native LDP and LDPath languages.
BlazeGraph ∗ é é é é ∗ BlazeGraph offers SPARQL* to query RDF*.
BrightstarDB é é é é é é
Cray Graph Engine é é é é é é
gStore é é é é é é
Ontotext GraphDB é é é é é é
Profium Sense é é é é é é
TripleBit é é é é é é
NATIVE GRAPH DATABASES (LPG model based) (§ 6.9).
Gbase é é é é é é
GraphBase é é é é é é GraphBase uses its native query language.
Graphflow é é ∗‡ é é é ∗ Graphflow supports a subset of Cypher [159]. ‡ Graphflow supports
Cypher++ extension with subgraph-condition-action triggers [128].
LiveGraph é é é é é é No focus on languages and queries.
Memgraph é é ∗ é é é ∗ openCypher.
Neo4j é ∗ é ‡ † ∗ Gremlin is supported as a part of TinkerPop integration.
‡ GraphQL supported with the GRANDstack layer.
† Neo4j can be embedded in Java applications.
Sparksee/DEX é é é é (.NET) ∗ Sparksee/DEX also supports C++, Python, Objective-C, and Java APIs.
∗
TigerGraph é é é é é é TigerGraph uses GSQL [218].
Weaver é é é é é (C)∗ ∗ Weaver also supports C++, Python.
Table 4. Support for different graph database query languages in different graph database systems. “Progr. API”
determines whether a given system supports formulating queries using some native programming language such as C++.
“ ”: A system supports a given language. “ ”: A system supports a given language in a limited way. “é”: A system does
not support a given language.
19
model, as well as data and storage organization. Tables 2 and 3 illustrate the details of different
graph database systems, including the ones described in this section4 . The tables indicate which
features are supported by which systems. We use symbols “ ”, “ ”, and “é” to indicate that a
given system offers a given feature, offers a given feature in a limited way, and does not offer a
given feature, respectively. “?” indicates we were unable to infer this information based on the
available documentation. We report the support for different graph query languages in Table 4.
Finally, we analyze different taxonomy aspects in § 7.
6.2.1 Cray Graph Engine. Cray Graph Engine (CGE) [187] is a triple store that can scale to a
trillion RDF triples. CGE does not store triples but quads (4-tuples), where the fourth element is a
graph ID. Thus, one can store multiple graphs in one CGE database. Quads in CGE are grouped
by their predicate and the identifier of the graph that they are a part of. Thus, only a pair with a
subject and an object needs to be stored for one such group of quads. These subject/object pairs are
stored in hashtables (one hashtable per group). Since each subject and object is represented as a
unique 48-bit integer identifier (HURI), the subject/object pairs can be packed into 12 bytes and
stored in a 32-bit unsigned integer array, ultimately reducing the amount of needed storage.
6.2.2 AllegroGraph and BlazeGraph. There exist many other RDF graph databases. We briefly
describe two systems that extend the original RDF model: AllegroGraph and BlazeGraph.
First, some RDF stores allow for attaching attributes to a triple explicitly. AllegroGraph [92]
allows an arbitrary set of attributes to be defined per triple when the triple is created. However,
these attributes are immutable. Figure 10 presents an example RDF graph with such attributes. This
figure uses the same LPG graph as in previous examples provided in Figure 5 and Figure 6, which
contain example transformations from the LPG into the original RDF model.
Second, BlazeGraph [44] implements RDF* [109, 110], an augmentation of RDF that allows for
attaching triples to triple predicates (see Figure 11). Vertices can use triples for storing labels and
properties, analogously as with the plain RDF. However, with RDF*, one can represent LPG edges
more naturally than in the plain RDF. Specifically, edges can be stored as triples, and edge properties
can be linked to the edge triple via other triples.
4We encourage participation in this survey. In case the reader is in possession of additional information relevant for the
tables, the authors would welcome the input.
5 https://db-engines.com/en/ranking/graph+dbms
20
:Person Alice Bob
name = Alice 21 triple attribute 24
age = 21
name name
age knows age
:knows {since:09.08.2007}
since = 09.08.2007 Person/V-ID Person/V-ID
type type
:Person
name = Bob vertex vertex
age = 24
LPG graph RDF graph, with triple attributes
Fig. 10. Comparison of an LPG graph and an RDF graph: a transformation from LPG to RDF with triple attributes.
We represent the triple attributes as a set of key-value pairs. “Person/V-ID”, “age”, “name”, “type” and “knows” are RDF
URIs. The transformation uses the assumption that there is one label per vertex and edge.
:Person
name = Alice Alice Bob
age = 21 21 09.08.2007 24
name name
age since age
:knows
since = 09.08.2007
Person/V-ID Person/V-ID
knows
type a triple attached type
:Person
to a triple
name = Bob
age = 24
vertex vertex
LPG graph RDF* graph
Fig. 11. Comparison of an LPG graph and an RDF* graph: a transformation from LPG to RDF*, that enables attaching
triples to triple predicates. “Person/V-ID”, “age”, “name”, “type”, “since” and “knows” are RDF URIs. The transformation
uses the assumption that there is one label per vertex and edge.
6.3.1 WhiteDB. WhiteDB [224] is a tuple store that enables allocating new records (tuples) with
an arbitrary tuple length (number of tuple elements). Small values and pointers to other tuples
are stored directly in a given field. Large strings are kept in a separate store. Each large value
is only stored once, and a reference counter keeps track of how many tuples refer to it at any
time. WhiteDB only enables accessing single tuple records, there is no higher level query engine
or graph API that would allow to, for example, execute a query that fetches all neighbors of a
given vertex. However, one can use tuples as vertex and edge storage, linking them to one another
via memory pointers. This facilitates fast resolution of various queries about the structure of an
arbitrary irregular graph structure in WhiteDB. For example, one can store a vertex 𝑣 with its
properties as consecutive fields in a tuple associated with 𝑣, and maintain pointers to selected
neighborhoods of 𝑣 in 𝑣’s tuple. More examples on using WhiteDB (and other tuple stores such as
Graphd) for maintaining graph data can be found online [157, 224].
21
6.4.1 Microsoft’s Graph Engine (Trinity). Microsoft’s Graph Engine [199] is based on a distributed
KV store called Trinity. Trinity implements a globally addressable distributed RAM storage. In
Trinity, keys are called cell IDs and values are called cells. A cell can hold data items of different data
types, including IDs of other cells. MS Graph Engine introduces a graph storage layer on top of the
Trinity KV storage layer. Vertices are stored in cells, where a dedicated field contains a vertex ID or
a hash of this ID. Edges adjacent to a given vertex 𝑣 are stored as a list of IDs of 𝑣’s neighboring
vertices, directly in 𝑣’s cell. However, if an edge holds rich data, such an edge (together with the
associated data) can also be stored in a separate dedicated cell.
6.4.2 HyperGraphDB. HyperGraphDB [120] stores hypergraphs (definition in § 3.3.1). The
basic building blocks of HyperGraphDB are atoms, the values of the KV store. Every atom has
a cryptographically strong ID. This reduces a chance of collisions (i.e., creating identical IDs for
different graph elements by different peers in a distributed environment). Both hypergraph vertices
and hyperedges are atoms. Thus, they have their own unique IDs. An atom of a hyperedge stores a
list of IDs corresponding to the vertices connected by this hyperedge. Vertices and hyperedges also
have a type ID (i.e., a label ID) and they can store additional data (such as properties) in a recursive
structure (referenced by a value ID). This recursive structure contains value IDs identifying other
atoms (with other recursive structures) or binary data. Figure 12 shows an example of how a KV
store is used to represent a hypergraph in HyperGraphDB.
Fig. 12. An example utilization of key-value stores for maintaining hypergraphs in HyperGraphDB (a type is a
term used in HyperGraphDB to refer to a label).
22
Figure 13 contains an example of using documents for representing vertices, regular edges, and
lightweight edges in OrientDB. Figure 14 shows example vertex and edge documents.
vertex 1 vertex 2
name: Alice out in name: Bob
age: 21 age: 24
a lightweight edge
Fig. 13. Two vertex documents connected with a lightweight edge and a regular edge (knows) in OrientDB.
vertex document attribute (key-value) incoming edge RIDs outgoing edge RIDs lightweight edges: vertex RIDs
regular edge document attribute (key-value) incoming vertex RID outgoing vertex RID
Fig. 14. Example OrientDB vertex and edge documents (complex JSON documents are also supported).
6.5.2 ArangoDB. ArangoDB [18, 19] keeps its documents in a binary format called VelocyPack,
which is a compacted implementation of JSON documents. Documents can be stored in different
collections and have a _key attribute which is a unique ID within a given collection. Unlike
OrientDB, these IDs are no direct memory pointers. For maintaining graphs, ArangoDB uses vertex
collections and edge collections. The former are regular document collections with vertex documents.
Vertex documents store no information about adjacent edges. This has the advantage that a vertex
document does not have to be modified when one adds or removes edges. Second, edge collections
store edge documents. Edge documents have two particular properties: _from and _to, which are
the IDs of the documents associated with two vertices connected by a given edge. An optimization
in ArangoDB’s design prevents reading vertex documents and enables directly accessing one edge
document based on the vertex ID within another edge document. This may improve cache efficiency
and thus reduce query execution time [19].
One can use different collections of documents to store different edge types (e.g., “friend_of” or
“likes”). When retrieving edges conditioned on some edge type (e.g., “friend_of”), one does not have
to traverse the whole adjacency list (all “friend_of” and “likes” edges). Instead, one can target the
collection with the edges of the specific edge type (“friend_of”).
23
key cell key | value cell cell
Fig. 15. An illustration of wide-column stores: mapping keys to rows and column-keys to cells within the rows.
6.6.1 Titan and JanusGraph. Titan [23] and its continuation JanusGraph [144] are built on top of
wide-column stores. They can use different wide-column stores as backends, for example Apache
Cassandra [14]. In both systems, when storing a graph, each row represents a vertex. Each vertex
property and adjacent edge is stored in a separate cell. One edge is thus encoded in a single cell,
including all the properties of this edge. Since cells in each row are sorted by the cell key, this
sorting order can be used to find cells efficiently. For graphs, cell keys for properties and edges are
chosen such that after sorting the cells, the cells storing properties come first, followed by the cells
containing edges, see Figure 16. Since rows are ordered by the key, both systems straightforwardly
partition tables into so called tablets, which can then be distributed over multiple servers.
Fig. 16. An illustration of Titan and JanusGraph: using wide-column stores for storing graphs. The illustration is
inspired by and adapted from the work by Sharma [200].
24
of RDBMS tables constitute vertices and relationships between these rows form edges. Associated
properties and attributes are stored as key-value pairs in separate structures.
25
previous edges in
the neighborhoods of
the adjacent vertices
vertex properties
Fig. 17. Summary of the Neo4j structure: two vertices linked by a “knows” edge. Both vertices maintain linked lists of
properties. The edges are part of two doubly linked lists, one linked list per adjacent vertex.
a link to the
first edge a linked list of property records,
record each holding four property blocks
a vertex record:
inUse flags
nextEdgeID nextPropID labels
1 5 9 14
one indicates that the corresponding ID has some property value. Labels and vertices and edges are
mapped to each other in a similar way. Moreover, for each vertex, two bitmaps are stored: One
bitmap indicates the incoming edges, and another one the outgoing edges. Furthermore, two B+
trees maintain the information about what vertices an edge is connected to (one tree for each edge
direction). Figure 19 illustrates example mappings.
Sparksee is one of the few systems that are not record based. Instead, Sparksee uses maps
implemented as B+ trees [65] and bitmaps. The use of bitmaps allows for some operations to be
performed as bit-level operations. For example, if one wants to find all vertices with certain values
of properties such as “age” and “first name”, one can simply find two bitmaps associated with the
“age” and the “first name” properties, and then derive a third bitmap that is a result of applying a
bitwise AND operation to the two input bitmaps.
Uncompressed bitmaps could grow unmanageably in size. As most graphs are sparse, bitmaps
indexed by vertices or edges mostly contain zeros. To alleviate large sizes of such sparse bitmaps,
they are cut into 32-bit clusters. If a cluster contains a non-zero bit, it is stored explicitly. The bitmap
is then represented by a collection of (cluster-id, bit-data) pairs. These pairs are stored in a sorted
tree structure to allow for efficient lookup, insertion, and deletion.
26
property/label edge or
vertex ID
Fig. 19. Sparksee maps for properties, labels, and vertex/edge connectivity. All mappings are bidirectional.
6.9.3 GBase: Sparse Adjacency Matrix Format. GBase [127] is a system that can only represent
the structure of a directed graph; it stores neither properties nor labels. The goal of GBase is to
maintain a compression of the adjacency matrix of a graph such that one can efficiently retrieval
all incoming and outgoing edges of a selected vertex without the prohibitive 𝑂 (𝑛 2 ) matrix storage
overheads. Simultaneously, using the adjacency matrix enables verifying in 𝑂 (1) time whether
two arbitrary vertices are connected. To compress the adjacency matrix, GBase cuts it into 𝐾 2
quadratic blocks (there are 𝐾 blocks along each row and column). Thus, queries that fetch in- and
out-neighbors of each vertex require only to fetch 𝐾 blocks. The parameter 𝐾 can be optimized
for specific databases. When 𝐾 becomes smaller, one has to retrieve more small files (assuming
one block is stored in one file). If 𝐾 grows larger, there are fewer files but they become larger,
generating overheads. Further optimizations can be made when blocks contain either only zeroes
or only ones; this enables higher compression rates.
27
supports simple triples (subject, predicate, object) representing edges from subject identifiers via
predicates to objects. LPG allows vertices and edges to have labels and properties, thus enabling
more natural data modeling in different scenarios [188]. Still, it is not standardized, and there are
many variants (cf. § 3.3.3); However, it is becoming standardized in the upcoming SQL/PGQ and
GQL standards from ISO [90]. Some systems limit the number of labels to just one. For example,
MarkLogic allows properties for vertices but none for edges, and thus can be viewed as a combination
of LPG (vertices) and RDF (edges). Data stored in the LPG model can be converted to RDF, as
described in § 3.3.5. To benefit from different LPG features while keeping RDF advantages such as
simplicity, some researchers proposed and implemented modifications to RDF. Examples are triple
attributes or attaching triples to other triples (described in § 6.2.2).
Among native graph databases, while no LPG focused system simultaneously supports RDF,
some RDF systems (e.g., Amazon Neptune) also support LPG. Further, there has been recent work
on unifying RDF and LPG [12, 95, 142]. Many other classes (KV stores, document stores, RDBMS,
wide-column stores, OODBMS) offer only LPG (with some exceptions, e.g., Oracle Spatial and
Graph). The latter suggests that it may be easier to express the LPG datasets than the RDF datasets
with the respective non-graph data models such as a collection of documents.
Another interesting challenge is to understand better the design tradeoffs between LPG and RDF.
For example, it is unclear which one is more advantageous for different workload classes, under
different design constraints (disk vs. in-memory representation, distributed vs. shared-memory,
replicated vs. sharded, etc.). This could be achieved by developing formal runtime and storage
models followed by an extensive evaluation.
There are very few systems that use neither RDF nor LPG. HyperGraphDB uses the hypergraph
model and GBase uses a simple directed graph model without any labels or properties. Thus, these
models are of less relevance to practitioners, but they offer a potentially interesting direction for
researchers. Especially in the context of hypergraphs, there has been a recent proliferation of
novel schemes in other domains of graph processing, such as graph learning [24], suggesting that
exploring hypergraphs for graph databases may be a timely direction.
7.1.2 Graph Structure Representations. Many graph databases use variants of AL since it makes
traversing neighborhoods efficient and straightforward [188]. This includes several (but not all)
systems in the classes of LPG based native graph databases, KV stores, document stores, wide-
column stores, tuple stores, and OODBMS. Contrarily, none of the considered RDF, RDBMS, and
data hub systems explicitly use AL. This is because the default design of the underlying data model,
e.g., tables in RDBMS or documents in document stores, do not often use AL.
Moreover, none of the systems that we analyzed use an uncompressed AM as it is inefficient
with O (𝑛 2 ) space, especially for sparse graphs. Systems using AM focus on compression of the
adjacency matrix [36], trying to mitigate storage and query overheads (e.g., GBase [127]).
In AL, a potential cause for inefficiency is scanning all edges to find neighbors of a given vertex.
To alleviate this, index structures are employed [40]. For a graph with 𝑛 vertices, such an index is
an array of pointers to respective neighborhoods, taking only 𝑂 (𝑛) space.
7.2.1 Using Existing Storage Designs. Most graph database systems are built upon existing
storage designs, including key-value stores, wide-column stores, RDBMS, and others. The advantage
of using existing storage designs is that these systems are usually mature and well-tested. The
disadvantage is that they may not be perfectly optimized for graph data and graph queries. This
28
is what native graph databases attempt to address. Overall, there is a lot of research potential in
more efficient and more effective use of existing storage designs for graph databases. For example,
a promising direction would be to investigate how recent RDBMS development, such as worst-case
optimal joins [166], could be used for graph related workloads.
7.2.2 Data Layout of Records. The details of record-based data organization heavily depend on
a specific system. For example, an RDBMS could treat a table row as a record, key-value stores
often maintain a single value in a single record, while in document stores, a single document could
be a record. Importantly, some systems allow variable sized records (e.g., ArangoDB), others only
enable fixed sized records (e.g., Neo4j). Finally, we observe that while some systems (e.g., some
triple stores such as Cray Graph Engine) do not explicitly mention records, the data could still be
implicitly organized as records. In triple stores, one would naturally associate a triple with a record.
Graph databases often use one or more records per vertex (these records are sometimes referred
to as vertex records). Neo4j uses multiple fixed-size records for vertices, while document databases
use one document per vertex (e.g., ArangoDB). Edges are sometimes stored in the same record
together with the associated (source or destination) vertices (e.g., Titan or JanusGraph). Otherwise,
edges are stored in separate edge records (e.g., ArangoDB).
The records used by the studied graph databases may be unstructured (i.e., not having a pre-
specified format such as JSON), as is the case with KV stores. They can also be structured: document
databases often use the JSON format, wide-column stores have a key-value mapping inside each row,
row-oriented RDBMS divide each row into columns, OODBMS impose some class definition, and
tuple stores as well as some RDF stores use tuples. The details of data layout (i.e., how vertices and
edges are exactly represented and encoded in records) may still vary across different system classes.
Some structured systems still enable highly flexible structure inside their records. For example,
document databases that use JSON or wide-columns stores such as Titan and JanusGraph allow for
different key-value mappings for each vertex and edge. Other record based systems are more fixed
in their structure. For example, in OODBMS, one has to define a class for each configuration of
vertex and edge properties. In RDBMS, one has to define tables for each vertex or edge type.
Some systems (e.g., Sparksee, some triple stores, or column-oriented RDBMS) do not store
information about vertices and edges contiguously in dedicated records. Instead, they maintain
separate data structures for each property or label. The information about a given vertex is thus
distributed over different structures. To find a property of a particular vertex, one has to query the
associated data structure (index) for that property and find the value for the given vertex. Examples
of such used index structures are B+ trees (in Sparksee) or hashtables (in some RDF systems).
Overall, most systems use records to store vertices, most often one vertex per one record. Some
systems store edges in separate records, others store them together with the adjacent vertices. To
find a property of a particular vertex, one has to find a record containing the vertex. The searched
property is either stored directly in that record, or its location is accessible via a pointer. We observe
that these design choices are made arbitrarily. We conclude that an interesting research direction
would be developing formal performance models, and use them to guide the most advantageous
design choice for each type of storage backend for graph databases.
7.2.3 Adjacencies between Records. Another aspect of a graph data layout is the design of the
adjacency between records. One can either assign each record an ID and then link records to one
another via IDs, or one can use direct memory pointers. Using IDs requires an indexing structure
to find the physical storage address of a record associated with a particular ID. Direct memory
pointers do not require an index for a traversal from one record to its adjacent records. Note that
29
an index might still be used, for example to retrieve a vertex with a particular property value (in
this context, direct pointers only facilitate resolving adjacency queries between vertices).
Using direct pointers can accelerate graph traversals [188], as additional index traversals are
avoided. Another option is to assign each record a unique ID and use these IDs instead of direct
pointers to refer to other records. On one hand, this requires an additional indexing structure to
find the physical location of a record based on its ID. On the other hand, if the physical location
changes, it is usually easier to update the indexing structure instead of changing all associated
direct pointers, which may come with significant overhead [19]. Here, one could also make these
design choices and tradeoffs more accurate by designing performance models to guide them.
7.2.4 Storing Data Directly in Indexes. Sometimes graph data is stored directly in an index. Triple
stores use indexes for various permutations of subject, predicate and object to answer queries
efficiently. Jena TBD stores its triple data inside of these indexes, but has no triple table itself, since
the indexes already store all necessary data [17]. HyperGraphDB uses a key-value index, namely
Berkeley DB [168], to access its physical storage. This approach also enables sharing primitive data
values with a reference count, so that multiple identical values are stored only once [120].
7.2.5 Storing Strings. Many systems, for example RDFs, heavily use strings, for example for
keeping IDs of different parts of graph datasets. Thus, there are different schemes or maintaining
strings. First, many systems support both fixed-size and variable-size strings. This enables more
performance: the former (that encode, e.g., IDs) can be kept and accessed rapidly in dedicated
structures that assume certain specific sizes, while the latter (that encode, e.g., arbitrary text items)
often use separate dynamic structures that are slower to access but offer more flexibility [188, 224].
Such dynamic stores can be both in-memory structures, but they could even be separate files [188].
The considered systems offer other string related optimizations. For example, CGE optimizes how
it stores strings from its triples/quads. Storing multiple long strings per triple/quad is inefficient,
considering the fact that many triples/quads may share strings. Therefore, CGE – similarly to many
other RDF systems – maintains a dictionary that maps strings to unique 48-bit integer identifiers
(HURIs). For this, two distributed hashtables are used (one for mapping strings to HURIs and one
for mapping HURIs to strings). When loading, the strings are sorted and then assigned to HURIs.
This allows integer comparisons (equal, greater, smaller, etc.) to be used instead of more expensive
string comparisons. This approach is shared by, e.g., tuple stores such as WhiteDB.
RDF systems also harness external dedicated libraries for more effective string management.
For example, the RDF String library [179] facilitates constructing RDF strings. Specifically, it
automatically generates string representations of specified values, together with appropriate URI
prefixes. This library can be used to even encode multi-line strings, or conduct string interpolation
with embedded expressions. Another example is RDF String Turtle [214], a package that enables
conversions between the string-based and RDF JSON representations of RDF.
7.2.6 Data Distribution. Almost all considered systems support a multi server mode and data
replication. Data sharding is also widely supported, but there are some systems that do not offer
this feature. We expect that, with growing dataset sizes, data sharding will ultimately become as
common as data replication. Still, it is more complex to provide. We observe that, while sharding is
as widely supported on graph databases based on non-graph data models (e.g., document stores)
as data replication, there is a significant fraction of native graph databases (both RDF and LPG
based) that offer replication but not sharding. This indicates that non-graph backends are usually
more mature in designs and thus more attractive for industry purposes and for practitioners,
simultaneously implying research potential in the native graph database designs. We also observe
that certain systems offer some form of tradeoff between replication and sharding. Specifically,
30
OrientDB offers a form of sharding, in which not all collections of documents have to be copied
on each server. However, OrientDB does not enable sharding of the collections themselves (i.e.,
distributing one collection across many servers). If an individual collection grows large, it is the
responsibility of the user to partition the collection to avoid any additional overheads. Thus, it
would be interesting to research how to automatize sharding of single document collections, or
even single (large) documents. Another such example is Neo4j which supports replication and
provides certain level of support for sharding. Specifically, the user can partition the graph and
store each partition in a separate database, limiting data redundancy.
Here, another interesting research opportunity would be to accelerate graph database workloads
such as OLAP by harnessing partial data replication. Specifically, many OLAP graph database
workloads such as BFS can be expressed with linear algebra operations [130], and these workloads
could benefit from the partial replication of the adjacency matrix, for example as done in 2.5D &
-3D matrix multiplications [205, 206].
7.2.7 Indexes. Most graph database systems use indexes. Now, systems based on non-graph
backends, for example RDBMS or document stores, usually rely on existing indexing infrastructure
present in such systems. Native graph databases employ index structures for the neighborhoods of
each vertex, often in the form of direct pointers [188].
Neighborhood indexes are used mostly to speed up the access of adjacency lists to accelerate
traversal queries. JanusGraph calls these indexes vertex-centric. They are constructed specifically
for vertices, so that incident edges can be filtered efficiently to match the traversal conditions [23].
While JanusGraph allows multiple vertex-centric indexes per vertex, each optimized for different
conditions, which are then chosen by the query optimizer, simpler solutions exist as well. LiveGraph
uses a two-level hierarchy, where the first level distinguishes edges by their label, before pointing
to the actual physical storage [230]. Graphflow indexes the neighbors of a vertex into forward and
backward adjacency lists, where each list is first partitioned by the edge label, and secondly by the
label of the neighbor vertex [128]. Another example is Sparksee, which uses various different index
structures to find the adjacent vertices and properties of a vertex [151].
An interesting research opportunity is to design indexes for richer “higher-order” structural infor-
mation, beyond plain neighborhoods. Specifically, a recent wave of pattern matching schemes [41,
158] indicates the importance of higher-order graph structure, such as triangles that each vertex
belongs to. Indexing such information would significantly speed up queries related to clique mining,
dense subgraph discovery, clustering, and many others. While some work has been done in this
respect [196], many designs could be proposed.
Data indexes concern data beyond the neighborhood information, and they can be used to
accelerate query plan generation and query execution. It is possible for example to index all vertices
that have a specific property (value). They are usually employed to speed up Business Intelligence
workloads (details on workloads are in § 4). Many triple stores, for example AllegroGraph [92],
provide all six permutations of subject (S), predicate (P), and object (O) as well as additional
aggregated indexes. However, to reduce associated costs, other approaches exist as well: TripleBit
uses just two permutations (PSO, POS) with two aggregated indexes (SP, SO) and two auxiliary
index structures [228]. gStore implements pattern matching queries with the help of two index
structures: a VS*-tree, which is a specialized B+-tree, and a trie-based T-index [231]. Some database
systems like Amazon Neptune [5] or AnzoGraph [55] only provide implicit indexes, while still
being confident to answer all kinds of queries efficiently. However, most graph database systems
allow the user to explicitly define data indexes. Some of them, like Azure Cosmos DB [161], support
composite indexes (a combination of different labels/properties) for more specific use cases. In
addition to internal indexes, some systems employ external indexing tools. For example, Titan and
31
Graph Database System Tree Hashtable Skip list Additional remarks
Apache Jena TBD ∗ é é ∗ B+-tree
ArangoDB é ∗ ∗ ∗ depends on the used index engine
Blazegraph ∗ é é ∗ B+-tree
Dgraph é é
Memgraph é é
OrientDB ∗ ‡ é ∗ SB-tree with base 500
‡ also supports a distributed hashtable index
VelocityGraph ∗ é é ∗ B-tree
Virtuoso ∗ é é ∗ 2D R-tree
WhiteDB ∗ é é ∗ T-tree
Table 5. Support for different index implementations in different graph database systems. “ ”: A system supports
a given index implementation. “é”: A system does not support a given index implementation.
JanusGraph [23] use internal indexing for label- and value-based lookups, but rely on external
indexing backends (e.g., Elasticsearch [82] or Apache Solr [13]) for non-trivial lookups involving
multiple properties, ranges, or full-text search.
Structural indexes are used for various internal data. Here, LiveGraph uses a vertex index to
map its vertex IDs to a physical storage location [230]. ArangoDB uses a hybrid index, a hashtable,
to find the documents of incident edges and adjacent vertices of a vertex [18].
We categorize systems (for which we were able to find this information) according to this
criteria in Table 5. We find no clear connection between the index type and the backend of a graph
database, but most systems use tree based indexes. A research opportunity, useful especially for
practitioners, would be to conduct a detailed and broad performance comparison of different index
implementations for different workloads.
7.2.8 Data Organization vs. Database Performance. Record based systems usually deliver more
performance for queries that need to retrieve all or most information about a vertex or an edge.
They are more efficient because the required data is stored in consecutive memory blocks. In
systems that store data in indexes, one queries a data structure per property, which results in a
more random access pattern. On the other hand, if one only wants to retrieve single properties
about vertices or edges, such systems may only have to retrieve a single value. Contrarily, many
record based systems cannot retrieve only parts of records, fetching more data than necessary.
Furthermore, a decision on whether to use IDs versus direct memory pointers to link records
depends on the read/write ratio of the workload for the given system. In the former case, one
has to use an indexing structure to find the address of the record. This slows down read queries
compared to following direct pointers. However, write queries can be more efficient with the use of
IDs instead of pointers. For example, when a record has to be moved to a new address, all pointers
to this record need to be updated to reflect this new address. IDs could remain the same, only the
indexing structure needs to modify the address of the given record.
The available performance studies [9, 133, 153, 154, 220] indicate that systems based on non-graph
data models, for example document stores or wide-column stores, usually achieve more performance
for transactional workloads that update the graph. Contrarily, read-only workloads (both simple
and global analytics) often achieve more performance on native graph stores. Global analytics
particularly benefit from native graph stores that ensure parallelization of single queries [153].
It underlies the potential and research opportunities for developing hybrid database systems for
graph workloads, that could combine the advantages of databases using non-graph data models
and of native graph stores.
32
7.3 Discussion and Takeaways on Query Execution
We discuss the query execution aspects of our taxonomy with respect to the specific graph databases.
Our discussion is by necessity brief, as most systems do not disclose this information7 .
7 There is usually much more information available on the data layout of a graph database, and not its execution engine.
33
Optimizing Parallel Execution Some of the systems that support parallel query execution
explicitly optimize the amount of data communicated when executing such parallelized queries. For
example, the computation in CGE is distributed over the participating processes. To minimize the
amount of all-to-all communication, query results are aggregated locally and – whenever possible –
each process only communicates with a few peers to avoid network congestion. Another way to
minimize communication, used by MS Graph Engine and the underlying Trinity database, is to
reduce the sizes of data chunks exchanged by processes. For this, Trinity maintains special accessors
that allow for accessing single attributes within a cell without needing to load the complete cell.
This lowers the I/O cost for many operations that do not need the whole cells. Several systems
harness one-sided communication, enabling processes to access one another’s data directly [98]. For
example, Trinity can be deployed on InfiniBand [118] to leverage Remote Direct Memory Access
(RDMA) [98]. Similarly, Cray’s infrastructure makes memory resources of multiple compute nodes
available as a single global address space, also enabling one-sided communication in CGE. This
facilitates parallel programming in a distributed environment [34, 98, 198].
Here, a promising research opportunity is to harness established optimized communication
routines such as collectives [59] for large-scale OLAP and BI.
Other Execution Optimizations The considered databases come with numerous other system-
specific design optimizations. For example, an optimization in ArangoDB’s design allows to skip
accessing the vertex document and enables directly accessing one edge document based on the
vertex ID within another edge document. This may improve cache efficiency and thus reduce query
execution time [19]. Another example is Oracle Spatial and Graph that offers an interesting option
of switching its data backend based on the query being executed. Specifically, its in-memory analysis is
boosted by the possibility to switch the underlying relational storage with the native graph storage
provided by the PGX processing engine [81, 117, 191]. In such a configuration, Oracle Spatial and
Graph effectively becomes a native graph database. PGX comes with two variants, PGX.D and
PGX.SM, that – respectively – offer distributed and shared-memory processing capabilities [117].
Furthermore, some systems use the LPG specifics to implement OLAP queries more effectively. For
example, to implement BFS, Weaver uses special designated properties associated with vertices to
indicate, whether a vertex has already been already visited. Such special properties are not visible
to an external user of the graph database and are usually deallocated after a given query is finalized.
A lot of work has been done into optimizing distributed-memory algebraic operations such as
matrix products using optimal communication routines [96, 100, 137–139, 205, 206]. It would be
interesting to investigate how such routines can be used to speed up OLAP graph queries.
There is a large body of existing work in the design of dynamic graph processing frameworks [33].
These systems differ from graph databases in several aspects, for example they often employ simple
graph models (and not LPG or RDF). Simultaneously, they share the fundamental property of graph
databases: dealing with a dynamic graph with evolving structure. Moreover, different performance
analyses indicate that streaming frameworks are much faster (up to orders of magnitude) than
graph databases, especially in the context of raw graph updates per second [154, 220]. This suggest
that harnessing mechanisms used in such frameworks in the context of graph databases could
significantly enhance the performance of the latter, and is a potentially fruitful research direction.
Furthermore, while there exists past research into the impact of the underlying network on the
performance of a distributed graph analytics framework [172], little was done into investigating this
performance relationship in the context of graph database workloads. To the best of our knowledge,
there are no efforts into developing a topology-aware or routing-aware data distribution scheme for
graph databases, especially in the context of recently proposed data center and high-performance
computing network topologies [35, 132] and routing architectures [32].
34
Finally, contrarily to the general static graph processing and graph streaming, little research
exists into accelerating graph databases using different types of hardware architectures, accelerators,
and hardware-related designs, for example FPGAs [39], designs related to network interface cards
such as SmartNICs [73], or processing in memory [4].
7.3.2 ACID. We also discuss various aspects of ACID transactions. ACID transactions are usually
used to implement OLTP queries. OLAP queries are read-only analytics, and thus are less relevant
for the notion of ACID. For example, in Weaver, OLAP workloads are referred to as “node programs”
and are treated differently from transactions, which are used to implement OLTP queries.
Support Overall, support for ACID transactions is widespread in graph databases. However,
there are some differences between respective system classes. For example, all considered document
and RDBMS graph databases offer full ACID support. Contrarily, only around half of all considered
key-value and wide-column based systems support ACID transactions. This could be caused by the
fact that some backends have more mature transaction related designs.
Implementation Two important ways to implement transactions are through locking or time-
stamps. Neo4j uses write locks to protect modifications until they are committed. Weaver uses
timestamps to reorder transactions into an serializable order if necessary. Other systems such as
LiveGraph combine both ways, and use timestamps to select the correct version of a vertex/edge
(solving simple read/write conflicts) and vertex locks to deal with concurrent writes.
7.3.3 Support for OLAP and OLTP Queries. We analyze support for OLTP and OLAP. Both
categories are widely supported, but with certain differences across specific backend classes,
specifically, (1) all considered document stores focus solely on OLTP, (2) some RDBMS graph
databases do not support or focus on OLAP, and (3) some native graph databases do not support
OLTP. We conjecture that this is caused by the prevalent historic use cases of these systems, and the
associated features of the backend design. For example, document stores have traditionally mostly
focused on maintaining document related data and to answer simple queries, instead of running
complicated global graph analytics. Thus, it may be very challenging to ensure high performance
of such global workloads on this backend class. Instead, native graph databases work directly with
the graph data model, making it simpler to develop fast traversals and other OLAP workloads.
As for RDBMS, they were traditionally not associated with graph global workloads. However,
graph analytics based on RDBMS has become a separate and growing area of research. Zhao et
al. [229] study the general use of RDBMS for graphs. They define four new relational algebra
operations for modeling graph operations. They show how to define these four operations with
six smaller building blocks: basic relational algebra operations, such as group-by and aggregation.
Xirogiannopoulos et al. [225] describe GraphGen, an end-to-end graph analysis framework that
is built on top of an RDBMS. GraphGen supports graph queries through so called Graph-Views
that define graphs as transformations over underlying relational datasets. This provides a graph
modeling abstraction, and the underlying representation can be optimized independently.
Some document stores still provide at least partial support for traversal-like workloads. For
example, in ArangoDB, documents are indexed using a hashtable, where the _key attribute serves
as the hashtable key. A traversal over the neighbors of a given vertex works as follows. First, given
the _key of a vertex 𝑣, ArangoDB finds all 𝑣’s adjacent edges using the hybrid index. Next, the
system retrieves the corresponding edge documents and fetches all the associated _to properties.
Finally, the _to properties serve as the new _key properties when searching for the neighboring
vertices.
There are other correlations between supported workloads and system design features. For
instance, we observe that systems that do not target OLTP, also often do not provide, or focus on,
35
ACID transactions. This is because ACID is not commonly used with OLAP. Examples include Cray
Graph Engine, RedisGraph, or Graphflow.
There also exist many OLAP graph workloads that have been largely unaddressed by the design
and performance analyses of existing graph database systems. This includes vertex reordering
problems (e.g., listing vertices by their degeneracy), or optimization (e.g., graph coloring) [31].
There problems were considered in the context of graph algorithms processing simple graphs, and
incorporating rich models such as RDF would further increase complexity, and offer many associated
research challenges, for example designing indexes, data layouts, or distribution strategies.
7.3.4 Supported Languages. We also analyze support for graph query languages. Some types of
backends focus on one specific language: triple stores and SPARQL, document stores and Gremlin,
wide-column stores and Gremlin, RDBMS and SQL. Other classes are not distinctively correlated
with some specific language, although Cypher seems most popular among LPG based native graph
stores. Usually, the query language support is primarily affected by the supported conceptual graph
model; if it is RDF, then the system usually supports SPARQL while systems focusing on LPG often
support Cypher or Gremlin.
Several systems come with their own languages, or variants of the established ones. For example,
in MS Graph Engine, cells are associated with a schema that is defined using the Trinity Specification
Language (TSL) [199]. TSL enables defining the structure of cells similarly to C-structs. For example,
a cell can hold data items of different data types, including IDs of other cells. Moreover, querying
graphs in Oracle Spatial and Graph is possible using PGQL [221], a declarative, SQL-like, graph
pattern matching query language. PGQL is designed to match the hybrid structure of Oracle Spatial
and Graph, and it allows for querying both data stored on disk in Oracle Database as well as in
in-memory parts of graph datasets.
Besides their primary language, systems also offer support for additional language functionalities.
For example, Oracle Spatial and Graph also supports SQL and SPARQL (for RDF graphs). Moreover,
the offered Java API implements Apache Tinkerpop interfaces, including the Gremlin API.
36
8 CONCLUSION
Graph databases constitute an important area of academic research and different industry efforts.
They are used to maintain, query, and analyze numerous datasets in different domains in industry
and academia. Many graph databases of different types have been developed. They use many data
models and representations, they are constructed using miscellaneous design choices, and they
enable a large number of queries and workloads. In this work, we provide the first survey and
taxonomy of this rich graph database landscape. Our work can be used not only by researchers
willing to learn more about this fascinating subject, but also by architects, developers, and project
managers who want to select the most advantageous graph database system or design.
Acknowledgments
We thank Gabor Szarnyas for extensive feedback, and Hannes Voigt, Matteo Lissandrini, Daniel
Ritter, Lukas Braun, Janez Ales, Nikolay Yakovets, and Khuzaima Daudjee for insightful remarks.
We thank Timo Schneider for immense help with infrastructure at SPCL, and PRODYNA AG (Darko
Križić, Jens Nixdorf, Christoph Körner) for generous support. This project received funding from the
European Research Council (Project PSAP, No. 101002047), and the European High-Performance
Computing Joint Undertaking (JU) under grant agreement No. 955513 (MAELSTROM). This project
was supported by the ETH Future Computing Laboratory (EFCL), financed by a donation from
Huawei Technologies. This project received funding from the European Union’s HE research and
innovation programme under the grant agreement No. 101070141 (Project GLACIATION).
References
[1] Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. 2007. Scalable Semantic Web Data Management
Using Vertical Partitioning. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB ’07).
411–422.
[2] Sunitha Abburu and Suresh Babu Golla. 2015. Effective Partitioning and Multiple RDF Indexing for Database Triple
Store. Engineering Journal 19, 5 (Oct. 2015), 139–154.
[3] Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. Empty-
Headed: A Relational Engine for Graph Processing. ACM Trans. Database Syst. 42, 4, Article 20 (Oct 2017), 44 pages.
[4] Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A Scalable Processing-in-Memory
Accelerator for Parallel Graph Processing. In Proceedings of the 42nd Annual International Symposium on Computer
Architecture (ISCA ’15). ACM, 105–117.
[5] Amazon. 2018. Amazon Neptune. Available at https://aws.amazon.com/neptune/. (2018).
[6] Amazon. 2022. Columnar Storage Developer Guide. https://docs.aws.amazon.com/redshift/latest/dg/c_columnar_
storage_disk_mem_mgmnt.html. (2022).
[7] Renzo Angles, Marcelo Arenas, Pablo Barcelo, Peter Boncz, George Fletcher, Claudio Gutierrez, Tobias Lindaaker,
Marcus Paradies, Stefan Plantikow, Juan Sequeda, Oskar van Rest, and Hannes Voigt. 2018. G-CORE: A Core for Future
Graph Query Languages. In Proceedings of the International Conference on Management of Data (SIGMOD ’18). ACM,
1421–1432.
[8] Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan Reutter, and Domagoj Vrgoč. 2017. Foundations of
Modern Query Languages for Graph Databases. ACM Comput. Surv. 50, 5, Article 68 (Sept. 2017), 40 pages.
[9] Renzo Angles, Peter Boncz, Josep Larriba-Pey, Irini Fundulaki, Thomas Neumann, Orri Erling, Peter Neubauer, Norbert
Martinez-Bazan, Venelin Kotsev, and Ioan Toma. 2014. The Linked Data Benchmark Council: A Graph and RDF
Industry Benchmarking Effort. SIGMOD Rec. 43, 1 (May 2014), 27–31.
[10] Renzo Angles and Claudio Gutierrez. 2008. Survey of Graph Database Models. ACM Comput. Surv. 40, 1, Article 1 (Feb.
2008), 39 pages.
[11] Renzo Angles and Claudio Gutierrez. 2018. An Introduction to Graph Data Management. In Graph Data Management,
Fundamental Issues and Recent Developments, George H. L. Fletcher, Jan Hidders, and Josep Lluís Larriba-Pey (Eds.).
Springer, 1–32.
[12] Renzo Angles, Aidan Hogan, Ora Lassila, Carlos Rojas, Daniel Schwabe, Pedro Szekely, and Domagoj Vrgoč. 2022.
Multilayer Graphs: A Unified Data Model for Graph Databases. In Proceedings of the 5th ACM SIGMOD Joint International
37
Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) (GRADES-
NDA ’22). Article 11, 6 pages.
[13] Apache Software Foundation. 2004. Apache Solr. Available at https://solr.apache.org/. (2004).
[14] Apache Software Foundation. 2008. Apache Cassandra. Available at https://cassandra.apache.org/. (2008).
[15] Apache Software Foundation. 2013. Apache Giraph. Available at https://giraph.apache.org/. (2013).
[16] Apache Software Foundation. 2018. Apache Mormotta. Available at http://marmotta.apache.org/. (2018).
[17] Apache Software Foundation. 2021. Apache Jena TBD. Available at https://jena.apache.org/documentation/tdb/index.
html. (2021).
[18] ArangoDB Inc. 2018. ArangoDB. Available at https://www.arangodb.com/docs/stable/data-models.html. (2018).
[19] ArangoDB Inc. 2018. ArangoDB: Index Free Adjacency or Hybrid Indexes for Graph Databases. Available at https:
//www.arangodb.com/2016/04/index-free-adjacency-hybrid-indexes-graph-databases/. (2018).
[20] Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: A Database
Benchmark Based on the Facebook Social Graph. In Proceedings of the International Conference on Management of Data
(SIGMOD ’13). ACM, 1185–1196.
[21] Malcolm Atkinson, David DeWitt, David Maier, François Bancilhon, Klaus Dittrich, and Stanley Zdonik. 1989. The
Object-Oriented Database System Manifesto. In Proceedings of the First International Conference on Deductive and
Object-Oriented Databases (DOOD ’89). 223–240.
[22] Paolo Atzeni and Valeria De Antonellis. 1993. Relational Database Theory. Addison-Wesley.
[23] Aurelius. 2015. Titan Data Model. Available at http://s3.thinkaurelius.com/docs/titan/1.0.0/data-model.html. (2015).
[24] Song Bai, Feihu Zhang, and Philip H. S. Torr. 2021. Hypergraph convolution and hypergraph attention. Pattern
Recognition 110 (Feb. 2021), 107637.
[25] Alexandru T. Balaban. 1985. Applications of Graph Theory in Chemistry. J. Chem. Inf. Comput. Sci. 25, 3 (Aug. 1985),
334–343.
[26] Sumita Barahmand and Shahram Ghandeharizadeh. 2013. BG: A Benchmark to Evaluate Interactive Social Networking
Actions. In Proceedings of the 6th Biennial Conference on Innovative Data Systems Research (CIDR ’13).
[27] Jesús Barrasa. 2017. RDF Triple Stores vs. Labeled Property Graphs: What’s the Difference? Available at https:
//neo4j.com/blog/rdf-triple-store-vs-labeled-property-graph-difference/. (2017).
[28] Daniel Bartholomew. 2012. MariaDB vs. MySQL. Dostopano 7, 10 (2012).
[29] Omar Batarfi, Radwa El Shawi, Ayman G Fayoumi, Reza Nouri, Seyed-Mehdi-Reza Beheshti, Ahmed Barnawi, and
Sherif Sakr. 2015. Large Scale Graph Processing Systems: Survey and an Experimental Evaluation. Cluster Computing
18, 3 (2015), 1189–1213.
[30] Scott Beamer, Krste Asanović, and David Patterson. 2015. The GAP Benchmark Suite. (2015). arXiv:1508.03619
[31] Maciej Besta, Armon Carigiet, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, and Torsten Hoefler. 2020.
High-Performance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and Quality. In Proceedings of the
International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20). IEEE, Article 99,
17 pages.
[32] Maciej Besta, Jens Domke, Marcel Schneider, Marek Konieczny, Salvatore Di Girolamo, Timo Schneider, Ankit Singla,
and Torsten Hoefler. 2021. High-Performance Routing with Multipathing and Path Diversity in Ethernet and HPC
Networks. IEEE Transactions on Parallel and Distributed Systems 32, 4 (2021), 943–959.
[33] Maciej Besta, Marc Fischer, Vasiliki Kalavri, Michael Kapralov, and Torsten Hoefler. 2023. Practice of Streaming
Processing of Dynamic Graphs: Concepts, Models, and Systems. IEEE Transactions on Parallel and Distributed Systems
34, 6 (2023), 1860–1876.
[34] Maciej Besta and Torsten Hoefler. 2014. Fault Tolerance for Remote Memory Access Programming Models. In
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’14).
ACM, 37–48.
[35] Maciej Besta and Torsten Hoefler. 2014. Slim Fly: A Cost Effective Low-Diameter Network Topology. In Proceedings of
the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’14). IEEE, 348–359.
[36] Maciej Besta and Torsten Hoefler. 2018. Survey and Taxonomy of Lossless Graph Compression and Space-Efficient
Graph Representations. (2018). arXiv:1806.01799
[37] Maciej Besta, Florian Marending, Edgar Solomonik, and Torsten Hoefler. 2017. SlimSell: A Vectorizable Graph
Representation for Breadth-First Search. In Proceedings of the International Parallel and Distributed Processing Symposium
(IPDPS ’17). IEEE, 32–41.
[38] Maciej Besta, Michał Podstawski, Linus Groner, Edgar Solomonik, and Torsten Hoefler. 2017. To Push or To Pull:
On Reducing Communication and Synchronization in Graph Computations. In Proceedings of the 26th International
Symposium on High-Performance Parallel and Distributed Computing (HPDC ’17). ACM, 93–104.
38
[39] Maciej Besta, Dimitri Stanojevic, Johannes De Fine Licht, Tal Ben-Nun, and Torsten Hoefler. 2019. Graph Processing
on FPGAs: Taxonomy, Survey, Challenges. (2019). arXiv:1903.06697
[40] Maciej Besta, Dimitri Stanojevic, Tijana Zivic, Jagpreet Singh, Maurice Hoerold, and Torsten Hoefler. 2018. Log(Graph):
A Near-Optimal High-Performance Graph Representation. In Proceedings of the 27th International Conference on Parallel
Architectures and Compilation Techniques (PACT ’18). ACM, Article 7, 13 pages.
[41] Maciej Besta, Zur Vonarburg-Shmaria, Yannick Schaffner, Leonardo Schwarz, Grzegorz Kwasniewski, Lukas Gianinazzi,
Jakub Beranek, Kacper Janda, Tobias Holenstein, Sebastian Leisinger, Peter Tatkowski, Esref Ozdemir, Adrian Balla,
Marcin Copik, Philipp Lindenberger, Marek Konieczny, Onur Mutlu, and Torsten Hoefler. 2021. GraphMineSuite:
Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra. Proc. VLDB Endow. 14, 11
(Jul 2021), 1922–1935.
[42] Maciej Besta, Simon Weber, Lukas Gianinazzi, Robert Gerstenberger, Andrey Ivanov, Yishai Oltchik, and Torsten
Hoefler. 2019. Slim Graph: Practical Lossy Graph Compression for Approximate Graph Processing, Storage, and
Analytics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and
Analysis (SC ’19). ACM, Article 35, 25 pages.
[43] Bitnine Global Inc. 2016. AgensGraph. Available at https://bitnine.net/agensgraph/. (2016).
[44] Blazegraph. 2018. BlazeGraph DB. Available at https://www.blazegraph.com/. (2018).
[45] Guy E. Blelloch and Bruce M. Maggs. 2010. Parallel Algorithms. In Algorithms and Theory of Computation Handbook:
Special Topics and Techniques. Chapman & Hall/CRC.
[46] Scott Boag, Don Chamberlin, Mary F Fernández, Daniela Florescu, Jonathan Robie, Jérôme Siméon, and Mugur
Stefanescu. 2002. XQuery 1.0: An XML Query Language.
[47] Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A Multiresolution
Coordinate-Free Ordering for Compressing Social Networks. In Proceedings of the 20th International Conference on
World Wide Web (WWW ’11). ACM, 587–596.
[48] Angela Bonifati, George Fletcher, Hannes Voigt, and Nikolay Yakovets. 2018. Querying Graphs. Springer.
[49] Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. The Journal of Mathematical Sociology 25, 2 (2001),
163–177.
[50] Tim Bray. 2014. The JavaScript Object Notation (JSON) Data Interchange Format. RFC 7159. (2014).
[51] Tim Bray, Jean Paoli, C Michael Sperberg-McQueen, Eve Maler, and Franois Yergeau. 2000. Extensible Markup Language
(XML) 1.0.
[52] Jeen Broekstra, Arjohn Kampman, and Frank van Harmelen. 2002. Sesame: A Generic Architecture for Storing and
Querying RDF and RDF Schema. In Proceedings of the International Semantic Web Conference (ISWC ’02), Ian Horrocks
and James Hendler (Eds.). Lecture Notes in Computer Science, Vol. 2342. Springer, 54–68.
[53] Callidus Software Inc. 2010. OrientDB. Available at https://orientdb.org. (2010).
[54] Callidus Software Inc. 2018. OrientDB: Lightweight Edges. Available at https://orientdb.com/docs/3.0.x/java/
Lightweight-Edges.html. (2018).
[55] Cambridge Semantics. 2018. AnzoGraph. Available at https://www.cambridgesemantics.com/anzograph/. (2018).
[56] Mihai Capotă, Tim Hegeman, Alexandru Iosup, Arnau Prat-Pérez, Orri Erling, and Peter Boncz. 2015. Graphalytics: A
Big Data Benchmark for Graph-Processing Platforms. In Proceedings of the GRADES’15. ACM, Article 7, 6 pages.
[57] Arnaud Castelltort and Anne Laurent. 2013. Representing history in graph-oriented NoSQL databases: A versioning
system. In Proceedings of the Eighth International Conference on Digital Information Management (ICDIM ’13). IEEE,
228–234.
[58] Cayley. 2018. CayleyGraph. Available at https://cayley.io/ and https://github.com/cayleygraph/cayley. (2018).
[59] Ernie Chan, Marcel Heimlich, Avi Purkayastha, and Robert van de Geijn. 2007. Collective Communication: Theory,
Practice, and Experience: Research Articles. Concurr. Comput.: Pract. Exper. 19, 13 (Sept. 2007), 1749–1783.
[60] Jiefeng Cheng, Jeffrey Xu Yu, Bolin Ding, Philip S. Yu, and Haixun Wang. 2008. Fast Graph Pattern Matching. In
Proceedings of the 24th International Conference on Data Engineering (ICDE ’08). IEEE, 913–922.
[61] Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One Trillion
Edges: Graph Processing at Facebook-Scale. Proc. VLDB Endow. 8, 12 (Aug 2015), 1804–1815.
[62] Marek Ciglan, Alex Averbuch, and Ladialav Hluchy. 2012. Benchmarking Traversal Operations over Graph Databases.
In Proceedings of the 28th International Conference on Data Engineering Workshops (ICDEW ’12). IEEE, 186–189.
[63] James Clark and Steve DeRose. 1999. XML Path Language (XPath) Version 1.0.
[64] Edgar F. Codd. 1989. Relational Database: A Practical Foundation for Productivity. In Readings in Artificial Intelligence
and Databases. Elsevier, 60–68.
[65] Douglas Comer. 1979. The Ubiquitous B-Tree. ACM Comput. Surv. 11, 2 (June 1979), 121–137.
[66] Richard Cyganiak, David Wood, and Markus Lanthaler. 2014. RDF 1.1 Concepts and Abstract Syntax.
39
[67] George Bernard Dantzig and D. R. Fulkerson. 1955. On the Max Flow Min Cut Theorem of Networks. In RAND
Corporation Paper Series.
[68] DataStax Inc. 2018. DSE Graph (DataStax). Available at https://www.datastax.com/. (2018).
[69] Chris J. Date and Hugh Darwen. 1987. A Guide to the SQL Standard. Addison-Wesley.
[70] Ali Davoudian, Liu Chen, and Mengchi Liu. 2018. A Survey on NoSQL Stores. ACM Comput. Surv. 51, 2, Article 40
(April 2018), 43 pages.
[71] Dgraph Labs Inc. 2017. BadgerDB. Available at https://dbdb.io/db/badgerdb. (2017).
[72] Dgraph Labs, Inc. 2018. DGraph. Available at https://dgraph.io/ and https://dgraph.io/docs/. (2018).
[73] Salvatore Di Girolamo, Konstantin Taranov, Andreas Kurth, Michael Schaffner, Timo Schneider, Jakub Beránek, Maciej
Besta, Luca Benini, Duncan Roweth, and Torsten Hoefler. 2019. Network-Accelerated Non-Contiguous Memory
Transfers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and
Analysis (SC ’19). ACM, Article 56, 14 pages.
[74] E. W. Dijkstra. 1959. A Note on Two Problems in Connexion with Graphs. Numer. Math. 1, 1 (Dec. 1959), 269–271.
[75] Niels Doekemeijer and Ana Lucia Varbanescu. 2014. A Survey of Parallel Graph Processing Frameworks. Technical
Report. Delft University of Technology.
[76] David Dominguez-Sal, Peter Urbón-Bayes, Aleix Giménez-Vanó, Sergio Gómez-Villamor, Norbert Martínez-Bazan, and
Josep-Lluis Larriba-Pey. 2010. Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark.
In Proccedings of the International Conference on Web-Age Information Management (WAIM ’10), Heng Tao Shen, Jian
Pei, M. Tamer Özsu, Lei Zou, Jiaheng Lu, Tok-Wang Ling, Ge Yu, Yi Zhuang, and Jie Shao (Eds.). Lecture Notes in
Computer Science, Vol. 6185. Springer, 37–48.
[77] Ayush Dubey, Greg D. Hill, Robert Escriva, and Emin Gün Sirer. 2016. Weaver: A High-Performance, Transactional
Graph Database Based on Refinable Timestamps. Proc. VLDB Endow. 9, 11 (July 2016), 852–863.
[78] Paul DuBois. 1999. MySQL. New Riders Publishing.
[79] William Eberle, Jeffrey Graves, and Lawrence Holder. 2010. Insider Threat Detection Using a Graph-Based Approach.
Journal of Applied Security Research 6, 1 (2010), 32–81.
[80] David Ediger, Rob McColl, Jason Riedy, and David A Bader. 2012. STINGER: High performance data structure for
streaming graphs. In Conference on High Performance Extreme Computing (HPEC ’12). IEEE, 1–5.
[81] Hamid El Maazouz, Guido Wachsmuth, Martin Sevenich, Dalila Chiadmi, Sungpack Hong, and Hassan Chafi. 2019. A
DSL-Based Framework for Performance Assessment. In Proceedings of the International Conference Europe Middle East
& North Africa Information Systems and Technologies to Support Learning (EMENA-ISTL ’19) (Learning and Analytics in
Intelligent System), Mohammed Serrhini, Carla Silva, and Sultan Aljahdali (Eds.), Vol. 7. Springer, 260–270.
[82] Elastic. 2010. Elasticsearch. Available at https://www.elastic.co/elasticsearch/. (2010).
[83] Ramez Elmasri and Shamkant B. Navathe. 2011. Advantages of Distributed Databases. In Fundamentals of Database
Systems (6th ed.). Pearson, Chapter 25.1.5, 882.
[84] Ramez Elmasri and Shamkant B. Navathe. 2011. Data Fragmentation. In Fundamentals of Database Systems (6th ed.).
Pearson, Chapter 25.4.1, 894–897.
[85] Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and
Peter Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. In Proceedings of the International
Conference on Management of Data (SIGMOD ’15). ACM, 619–630.
[86] FactNexus. 2018. GraphBase. Available at https://graphbase.ai/. (2018).
[87] Jing Fan, Adalbert Gerald Soosai Raj, and Jignesh M. Patel. 2015. The Case Against Specialized Graph Analytics
Engines. In Proceedings of the 7th Biennial Conference on Innovative Data Systems Research (CIDR ’15).
[88] Fauna. 2016. FaunaDB. Available at https://fauna.com/. (2016).
[89] Xiyang Feng, Guodong Jin, Ziyi Chen, Chang Liu, and Semih Salihoğlu. 2023. KÙZU Graph Database Management
System. In Proceedings of the Conference on Innovative Data Systems Research (CIDR ’23).
[90] Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak,
Liat Peterfreund, Alexandra Rogova, and Domagoj Vrgoč. 2023. A Researcher’s Digest of GQL. In Proceedings of
the 26th International Conference on Database Theory (ICDT ’23), Floris Geerts and Brecht Vandevoort (Eds.). Leibniz
International Proceedings in Informatics (LIPIcs), Vol. 255. 1:1–1:22.
[91] Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow,
Mats Rydberg, Petra Selmer, and Andrés Taylor. 2018. Cypher: An Evolving Query Language for Property Graphs. In
Proceedings of the International Conference on Management of Data (SIGMOD ’18). ACM, 1433–1445.
[92] Franz Inc. 2018. AllegroGraph. Available at https://allegrograph.com/products/allegrograph/. (2018).
[93] Santhosh Kumar Gajendran. 2012. A Survey on NoSQL Databases. (2012).
[94] Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2002. Data Replication. In Database Systems: The
Complete Book (1st ed.). Pearson, Chapter 19.4.3, 1021.
40
[95] Ewout Gelling, George Fletcher, and Michael Schmidt. 2023. Bridging graph data models: RDF, RDF-star, and property
graphs as directed acyclic graphs. (2023). arXiv:2304.13097
[96] Evangelos Georganas, Jorge González-Domínguez, Edgar Solomonik, Yili Zheng, Juan Touriño, and Katherine Yelick.
2012. Communication Avoiding and Overlapping for Numerical Linear Algebra. In Proceedings of the International
Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12). IEEE, Article 100, 11 pages.
[97] Lars George. 2011. HBase: The Definitive Guide. O’Reilly Media.
[98] Robert Gerstenberger, Maciej Besta, and Torsten Hoefler. 2014. Enabling Highly-Scalable Remote Memory Access
Programming with MPI-3 One Sided. Scientific Programming 22, 2 (2014), 75–91.
[99] Lukas Gianinazzi, Pavel Kalvoda, Alessandro De Palma, Maciej Besta, and Torsten Hoefler. 2018. Communication-
Avoiding Parallel Minimum Cuts and Connected Components. SIGPLAN Not. 53, 1 (Feb 2018), 219–232.
[100] Niels Gleinig, Maciej Besta, and Torsten Hoefler. 2022. I/O-Optimal Cache-Oblivious Sparse Matrix-Sparse Matrix
Multiplication. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS ’22). IEEE,
36–46.
[101] Google. 2018. Graphd. Available at https://github.com/google/graphd. (2018).
[102] Graph Story Inc. 2018. Graph Story. Available at https://github.com/graphstory. (2018).
[103] Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Martin Schuster,
Petra Selmer, and Hannes Voigt. 2019. Updating Graph Databases with Cypher. Proc. VLDB Endow. 12, 12 (Aug. 2019),
2242–2254.
[104] Alastair Green, Martin Junghanns, Max Kießling, Tobias Lindaaker, Stefan Plantikow, and Petra Selmer. 2018. open-
Cypher: New Directions in Property Graph Querying. In Proceedings of the Joint EDBT/ICDT Workshops (EDBT ’18).
520–523.
[105] L. Guzenda. 2000. Objectivity/DB - A high performance object database architecture. In Proceedings of the Workshop
on High Performance Object Databases (HIPOD ’00).
[106] Jing Han, E Haihong, Guan Le, and Jian Du. 2011. Survey on NoSQL database. In Proceedings of the 6th International
Conference on Pervasive Computing and Applications (ICPCA ’11). IEEE, 363–366.
[107] Minyang Han, Khuzaima Daudjee, Khaled Ammar, M. Tamer Özsu, Xingfang Wang, and Tianqi Jin. 2014. An
Experimental Comparison of Pregel-like Graph Processing Systems. Proc. VLDB Endow. 7, 12 (Aug. 2014), 1047–1058.
[108] Andreas Harth, Jürgen Umbrich, Aidan Hogan, and Stefan Decker. 2007. YARS2: A Federated Repository for Querying
Graph Structured Data from the Web. In Proceedings of the International Semantic Web Conference/Asian Semantic Web
Conference (ISWC/ASWC ’07), Karl Aberer, Key-Sun Choi, Natasha Noy, Dean Allemang, Kyung-Il Lee, Lyndon Nixon,
Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudré-Mauroux (Eds.).
Lecture Notes in Computer Science, Vol. 4825. Springer, 211–224.
[109] Olaf Hartig. 2014. Reconciliation of RDF* and Property Graphs. (2014). arXiv:1409.3288
[110] Olaf Hartig. 2017. RDF* and SPARQL*: An Alternative Approach to Annotate Statements in RDF. In Proceedings of
the International Semantic Web Conference (Posters, Demos & Industry Track) (ISWC ’17).
[111] Olaf Hartig. 2019. Foundations to Query Labeled Property Graphs using SPARQL*. In Joint Proceedings of the 1st
International Workshop On Semantics For Transport and the 1st International Workshop on Approaches for Making Data
Interoperable (SEM4TRA-AMAR ’19).
[112] Olaf Hartig and Jorge Pérez. 2018. Semantics and Complexity of GraphQL. In Proceedings of the World Wide Web
Conference (WWW ’18). 1155–1164.
[113] Jonathan Hayes. 2004. A Graph Model for RDF. Diploma Thesis. Technische Universität Darmstadt, Universidad de
Chile.
[114] Joseph M. Hellerstein and Michael Stonebraker. 2005. Readings in Database Systems. MIT Press.
[115] Jeffrey A. Hoffer, Venkataraman Ramesh, and Heikki Topi. 2011. Modern Database Management. Pearson.
[116] Florian Holzschuher and René Peinl. 2013. Performance of Graph Query Languages: Comparison of Cypher, Gremlin
and Native Access in Neo4j. In Proceedings of the Joint EDBT/ICDT Workshops (EDBT ’13). ACM, 195–204.
[117] Sungpack Hong, Siegfried Depner, Thomas Manhardt, Jan Van Der Lugt, Merijn Verstraaten, and Hassan Chafi. 2015.
PGX.D: A Fast Distributed Graph Processing Engine. In Proceedings of the International Conference for High Performance
Computing, Networking, Storage and Analysis (SC ’15). ACM, Article 58, 12 pages.
[118] InfiniBand Trade Association. 2015. InfiniBand: Architecture Specification 1.3. (2015).
[119] InfoGrid. 2008. The InfoGrid Graph Database. Available at http://infogrid.org. (2008).
[120] Borislav Iordanov. 2010. HyperGraphDB: A Generalized Graph Database. In Proceedings of the International Conference
on Web-Age Information Management (WAIM ’10), Heng Tao Shen, Jian Pei, M. Tamer Özsu, Lei Zou, Jiaheng Lu,
Tok-Wang Ling, Ge Yu, Yi Zhuang, and Jie Shao (Eds.). Lecture Notes in Computer Science, Vol. 6185. Springer, 25–36.
[121] Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat-Pérez, Thomas Manhardto, Hassan
Chafio, Mihai Capotă, Narayanan Sundaram, Michael Anderson, Ilie Gabriel Tănase, Yinglong Xia, Lifeng Nai, and
41
Peter Boncz. 2016. LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed
Platforms. Proc. VLDB Endow. 9, 13 (Sept. 2016), 1317–1328.
[122] Bin Jiang. 2011. A Short Note on Data-Intensive Geospatial Computing. In Information Fusion and Geographic
Information Systems, Vasily V. Popovich, Christophe Claramunt, Thomas Devogele, Manfred Schrenk, and Kyrill
Korolenko (Eds.). Lecture Notes in Geoinformation and Cartography, Vol. 5. Springer, 13–17.
[123] Alekh Jindal, Samuel Madden, Malú Castellanos, and Meichun Hsu. 2015. Graph Analytics Using Vertica Relational
Database. In Proceedings of the International Conference on Big Data (Big Data ’15). IEEE, 1191–1200.
[124] Salim Jouili and Valentin Vansteenberghe. 2013. An Empirical Comparison of Graph Databases. In Proceedings of the
International Conference on Social Computing (SocialCom ’13). IEEE, 708–715.
[125] Martin Junghanns, André Petermann, Martin Neumann, and Erhard Rahm. 2017. Management and Analysis of Big
Graph Data: Current Systems and Open Challenges. In Handbook of Big Data Technologies, Albert Y. Zomaya and
Sherif Sakr (Eds.). Springer, 457–505.
[126] Vasiliki Kalavri, Vladimir Vlassov, and Seif Haridi. 2017. High-Level Programming Abstractions for Distributed Graph
Processing. IEEE Transactions on Knowledge and Data Engineering 30, 2 (2017), 305–324.
[127] U. Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos. 2012. Gbase: An Efficient Analysis
Platform for Large Graphs. The VLDB Journal 21, 5 (Oct. 2012), 637–650.
[128] Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu. 2017. Graphflow: An
Active Graph Database. In Proceedings of the International Conference on Management of Data (SIGMOD ’17). ACM,
1695–1698.
[129] Michael Kay. 2001. XSLT Programmer’s Reference. Wrox Press.
[130] Jeremy Kepner, Peter Aaltonen, David Bader, Aydın Buluç, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj
Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, Carl Yang, John D. Owens, Marcin Zalewski,
Timothy Mattson, and Jose Moreira. 2016. Mathematical Foundations of the GraphBLAS. In High Performance Extreme
Computing Conference (HPEC ’16). IEEE, 1–9.
[131] Vaibhav Khadilkar, Murat Kantarcioglu, Bhavani Thuraisingham, and Paolo Castagna. 2012. Jena-HBase: A Dis-
tributed, Scalable and Efficient RDF Triple Store. In Proceedings of the International Semantic Web Conference (Posters &
Demonstrations Track) (ISWC ’12). 85–88.
[132] John Kim, Wiliam J. Dally, Steve Scott, and Dennis Abts. 2008. Technology-Driven, Highly-Scalable Dragonfly
Topology. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA ’08). IEEE, 77–88.
[133] Vojtech Kolomicenko. 2013. Analysis and Experimental Comparison of Graph Databases. Master Thesis. Charles
University, Prague.
[134] Joseph B. Kruskal. 1956. On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. Proc.
Amer. Math. Soc. 7, 1 (1956), 48–50.
[135] Vijay Kumar and Anjan Babu. 2015. Domain Suitable Graph Database Selection: A Preliminary Report. In Proceedings
on the 3rd International Conference on Advances in Engineering Sciences & Applied Mathematics (ICAESAM ’15). 26–29.
[136] Rohit kumar Kaliyar. 2015. Graph databases: A survey. In Proceedings of the International Conference on Computing,
Communication & Automation (ICCCA ’15). IEEE, 785–790.
[137] Grzegorz Kwasniewski, Tal Ben-Nun, Lukas Gianinazzi, Alexandru Calotoiu, Timo Schneider, Alexandros Nikolaos
Ziogas, Maciej Besta, and Torsten Hoefler. 2021. Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O
Lower Bounds for Statically Analyzable Programs. In Proceedings of the 33rd Symposium on Parallelism in Algorithms
and Architectures (SPAA ’21). ACM, 328–339.
[138] Grzegorz Kwasniewski, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Timo Schneider, Maciej Besta, and Torsten Hoefler.
2021. On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal LU Factorization. In Proceedings of the
26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’21). 463–464.
[139] Grzegorz Kwasniewski, Marko Kabić, Maciej Besta, Joost VandeVondele, Raffaele Solcà, and Torsten Hoefler. 2019.
Red-Blue Pebbling Revisited: Near Optimal Parallel Matrix-Matrix Multiplication. In Proceedings of the International
Conference for High Performance Computing, Networking, Storage and Analysis (SC ’19). ACM, Article 24, 22 pages.
[140] Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper.
Syst. Rev. 44, 2 (April 2010), 35–40.
[141] LambdaZen LLC. 2017. Bitsy. Available at https://github.com/lambdazen/bitsy and https://bitbucket.org/lambdazen/
bitsy/wiki/Home. (2017).
[142] Ora Lassila, Michael Schmidt, Olaf Hartig, Brad Bebee, Dave Bechberger, Willem Broekema, Ankesh Khandelwal,
Kelvin Lawrence, Carlos Manuel Lopez Enriquez, Ronak Sharda, and Bryan Thompson. 2023. The OneGraph Vision:
Challenges of Breaking the Graph Model Lock-In. Semantic Web 14, 1 (2023), 125–134.
[143] Heng Lin, Xiaowei Zhu, Bowen Yu, Xiongchao Tang, Wei Xue, Wenguang Chen, Lufei Zhang, Torsten Hoefler,
Xiaosong Ma, Xin Liu, Weimin Zheng, and Jingfang Xu. 2018. ShenTu: Processing Multi-Trillion Edge Graphs on
42
Millions of Cores in Seconds. In Proceedings of the International Conference for High Performance Computing, Networking,
Storage and Analysis (SC ’18). IEEE, Article 56, 11 pages.
[144] Linux Foundation. 2018. JanusGraph. Available at http://janusgraph.org/. (2018).
[145] Matteo Lissandrini, Martin Brugnara, and Yannis Velegrakis. 2017. An Evaluation Methodology and Experimental
Comparison of Graph Databases. Technical Report. University of Trento.
[146] Matteo Lissandrini, Martin Brugnara, and Yannis Velegrakis. 2018. Beyond Macrobenchmarks: Microbenchmark-Based
Graph Database Evaluation. Proc. VLDB Endow. 12, 4 (Dec. 2018), 390–403.
[147] Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan W. Berry. 2007. Challenges in Parallel Graph
Processing. Parallel Processing Letters 17, 1 (2007), 5–20.
[148] Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz
Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In Proceedings of the International Conference on
Management of Data (SIGMOD ’10). ACM, 135–146.
[149] MariaDB. 2010. OQGRAPH. Available at https://mariadb.com/kb/en/oqgraph-storage-engine/. (2010).
[150] MarkLogic Corporation. 2005. MarkLogic. Available at https://www.marklogic.com. (2005).
[151] Norbert Martínez-Bazan, M. Ángel Águila Lorente, Victor Muntés-Mulero, David Dominguez-Sal, Sergio Gómez-
Villamor, and Josep-L. Larriba-Pey. 2012. Efficient Graph Management Based On Bitmap Indices. In Proceedings of the
16th International Database Engineering & Applications Symposium (IDEAS ’12). ACM, 110–119.
[152] József Marton, Gábor Szárnyas, and Dániel Varró. 2017. Formalising openCypher Graph Queries in Relational Algebra.
In Proceedings of the European Conference on Advances in Databases and Information Systems (ADBIS ’17), Mārı̄te
Kirikova, Kjetil Nørvåg, and George A. Papadopoulos (Eds.). Lecture Notes in Computer Science, Vol. 10509. Springer,
182–196.
[153] Kristyn J. Maschhoff, Robert Vesse, Sreenivas R Sukumar, Michael F Ringenburg, and James Maltby. 2017. Quantifying
Performance of CGE: A Unified Scalable Pattern Mining and Search System. In Proceedings of the Cray User Group
(CUG ’17).
[154] Robert Campbell McColl, David Ediger, Jason Poovey, Dan Campbell, and David A. Bader. 2014. A Performance
Evaluation of Open Source Graph Databases. In Proceedings of the First Workshop on Parallel Programming for Analytics
Applications (PPAA ’14). ACM, 11–18.
[155] Robert Ryan McCune, Tim Weninger, and Greg Madey. 2015. Thinking Like a Vertex: A Survey of Vertex-Centric
Frameworks for Large-Scale Distributed Graph Processing. ACM Comput. Surv. 48, 2, Article 25 (Oct. 2015), 39 pages.
[156] Memgraph Ltd. 2018. Memgraph. Available at https://memgraph.com/. (2018).
[157] Scott M. Meyer, Jutta Degener, John Giannandrea, and Barak Michener. 2010. Optimizing Schema-Last Tuple-
Store Queries in Graphd. In Proceedings of the International Conference on Management of Data (SIGMOD ’10). ACM,
1047–1056.
[158] Amine Mhedhbi, Matteo Lissandrini, Laurens Kuiper, Jack Waudby, and Gábor Szárnyas. 2021. LSQB: A Large-Scale
Subgraph Query Benchmark. In Proceedings of the 4th ACM SIGMOD Joint International Workshop on Graph Data
Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) (GRADES-NDA ’21). Article 8,
11 pages.
[159] Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing Subgraph Queries by Combining Binary and Worst-Case
Optimal Joins. Proc. VLDB Endow. 12, 11 (July 2019), 1692–1704.
[160] Microsoft. 2017. Microsoft SQL Server 2017. Available at https://www.microsoft.com/en-us/sql-server/sql-server-2017.
(2017).
[161] Microsoft. 2018. Azure Cosmos DB. Available at https://azure.microsoft.com/en-us/services/cosmos-db/. (2018).
[162] Bruce Momjian. 2001. PostgreSQL: Introduction and Concepts. Addison-Wesley.
[163] Thomas Mueller. 2005. H2 Database Engine. Available at http://www.h2database.com. (2005).
[164] Networked Planet Limited. 2018. BrightstarDB. Available at http://brightstardb.com/. (2018).
[165] Thomas Neumann and Gerhard Weikum. 2010. X-RDF-3X: Fast Querying, High Update Rates, and Consistency for
RDF Databases. Proc. VLDB Endow. 3, 1–2 (Sept. 2010), 256–263.
[166] Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2018. Worst-Case Optimal Join Algorithms. J. ACM 65, 3,
Article 16 (March 2018), 40 pages.
[167] Objectivity Inc. 2018. ThingSpan. Available at https://www.objectivity.com/products/thingspan/. (2018).
[168] Mike Olson, Keith Bostic, and Margo Seltzer. 1999. Berkeley DB. In Proceedings of the USENIX Annual Technical
Conference (ATC ’99).
[169] Ontotext. 2018. GraphDB. Available at https://www.ontotext.com/products/graphdb/. (2018).
[170] OpenLink. 2018. Virtuoso. Available at https://virtuoso.openlinksw.com/. (2018).
[171] Oracle. 2018. Oracle Spatial and Graph. Available at https://www.oracle.com/database/technologies/spatialandgraph.
html. (2018).
43
[172] Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun. 2015. Making Sense of
Performance in Data Analytics Frameworks. In Proceedings of the 12th USENIX Symposium on Networked Systems
Design and Implementation (NSDI ’15). 293–307.
[173] M. Tamer Özsu. 2016. A survey of RDF data management systems. Frontiers of Computer Science 10, 3 (2016), 418–432.
[174] Anil Pacaci, Alice Zhou, Jimmy Lin, and M. Tamer Özsu. 2017. Do We Need Specialized Graph Databases? Bench-
marking Real-Time Social Networking Applications. In Proceedings of the Fifth International Workshop on Graph
Data-Management Experiences & Systems (GRADES ’17). ACM, Article 12, 7 pages.
[175] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing
Order to the Web. Technical Report. Stanford InfoLab.
[176] N.S. Patil, P Kiran, N.P. Kavya, and K.M. Naresh Patel. 2018. A Survey on Graph Database Management Techniques
for Huge Unstructured Data. International Journal of Electrical and Computer Engineering 81, 2 (2018), 1140–1149.
[177] Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2009. Semantics and Complexity of SPARQL. ACM Trans.
Database Syst. 34, 3, Article 16 (Sept. 2009), 45 pages.
[178] Hasso Plattner. 2009. A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database. In
Proceedings of the International Conference on Management of Data (SIGMOD ’09). ACM, 1–2.
[179] Tomasz Pluskiewicz. 2022. RDF String. https://github.com/tpluscode/rdf-string. (2022).
[180] Jaroslav Pokornỳ. 2015. Graph Databases: Their Power and Limitations. In Proceedings of the IFIP International
Conference on Computer Information Systems and Industrial Management (CISIM ’15), Khalid Saeed and Wladyslaw
Homenda (Eds.). Lecture Notes in Computer Science, Vol. 9339. Springer, 58–69.
[181] Profium. 2018. Profium Sense. Available at https://www.profium.com/en/products/graph-database/. (2018).
[182] Roshan Punnoose, Adina Crainiceanu, and David Rapp. 2012. Rya: A Scalable RDF Triple Store for the Clouds. In
Proceedings of the 1st International Workshop on Cloud Intelligence (Cloud-I ’12). ACM, Article 4, 8 pages.
[183] Ashish Rana. 2019. Detailed Introduction: Redis Modules, from Graphs to Machine Learning (Part 1). Available at
https://medium.com/@ashishrana160796/15ce9ff1949f. (2019).
[184] Michel Raynal. 2013. Using Semaphores to Solve the Readers-Writers Problem. In Concurrent Programming: Algorithms,
Principles, and Foundations. Springer, Chapter 3.2.4, 74–78.
[185] Redis Labs. 2009. Redis. Available at https://redis.io/. (2009).
[186] Redis Labs. 2018. RedisGraph. Available at https://redis.io/docs/stack/graph/. (2018).
[187] Christopher D. Rickett, Utz-Uwe Haus, James Maltby, and Kristyn J. Maschhoff. 2018. Loading and Querying a Trillion
RDF triples with Cray Graph Engine on the Cray XC. In Proceedings of the Cray User Group (CUG ’18).
[188] Ian Robinson, Jim Webber, and Emil Eifrem. 2015. Graph Database Internals. In Graph Databases (2nd ed.). O’Reilly,
Chapter 7, 149–170.
[189] Marko A. Rodriguez. 2015. The Gremlin Graph Traversal Machine and Language. In Proceedings of the 15th Symposium
on Database Programming Languages (DBPL ’15). ACM, 1–10.
[190] Shahin Roozkhosh, Denis Hoornaert, Ju Hyoung Mun, Tarikul Islam Papon, Ahmed Sanaullah, Ulrich Drepper, Renato
Mancuso, and Manos Athanassoulis. 2021. Relational Memory: Native In-Memory Accesses on Rows and Columns.
(2021). arXiv:2109.14349
[191] Nicholas P. Roth, Vasileios Trigonakis, Sungpack Hong, Hassan Chafi, Anthony Potter, Boris Motik, and Ian Horrocks.
2017. PGX.D/Async: A Scalable Distributed Graph Pattern Matching Engine. In Proceedings of the Fifth International
Workshop on Graph Data-Management Experiences & Systems (GRADES ’17). ACM, Article 7, 6 pages.
[192] Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out Graph
Processing from Secondary Storage. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP ’15).
ACM, 410–424.
[193] Michael Rudolf, Marcus Paradies, Christof Bornhövd, and Wolfgang Lehner. 2013. The Graph Story of the SAP HANA
Database. In Datenbanksysteme für Business, Technologie und Web, Volker Markl, Gunter Saake, Kai-Uwe Sattler, Gregor
Hackenbroich, Bernhard Mitschang, Theo Härder, and Veit Köppen (Eds.). Gesellschaft für Informatik e.V., 403–420.
[194] Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M. Tamer Özsu. 2017. The Ubiquity of Large
Graphs and Surprising Challenges of Graph Processing. Proc. VLDB Endow. 11, 4 (Dec. 2017), 420–431.
[195] SAP. 2010. SAP HANA. Available at https://www.sap.com/products/technology-platform/hana.html. (2010).
[196] Y. Sasaki, G. Fletcher, and O. Makoto. 2022. Language-aware Indexing for Conjunctive Path Queries. In Proceedings of
the 38th International Conference on Data Engineering (ICDE ’22). IEEE, 661–673.
[197] Satu Elisa Schaeffer. 2007. Survey: Graph Clustering. Computer Science Review 1, 1 (Aug. 2007), 27–64.
[198] Patrick Schmid, Maciej Besta, and Torsten Hoefler. 2016. High-Performance Distributed RMA Locks. In Proceedings of
the 25th International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’16). ACM, 19–30.
[199] Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A Distributed Graph Engine on a Memory Cloud. In Proceedings
of the International Conference on Management of Data (SIGMOD ’13). ACM, 505–516.
44
[200] Sanjay Sharma. 2014. Cassandra Design Patterns. Packt Publishing.
[201] Xuanhua Shi, Zhigao Zheng, Yongluan Zhou, Hai Jin, Ligang He, Bo Liu, and Qiang-Sheng Hua. 2018. Graph
Processing on GPUs: A Survey. ACM Comput. Surv. 50, 6, Article 81 (Jan. 2018), 35 pages.
[202] Komal Singh and Vikram Singh. 2016. Graph pattern matching: A brief survey of challenges and research directions.
In Proceedings of the 3rd International Conference on Computing for Sustainable Global Development (INDIACom ’16).
199–204.
[203] solid IT GmbH. 2023. System Properties Comparison: Neo4j vs. Redis. Available at https://db-engines.com/en/system/
Neo4j%3BRedis. (2023).
[204] Edgar Solomonik, Maciej Besta, Flavio Vella, and Torsten Hoefler. 2017. Scaling Betweenness Centrality Using
Communication-Efficient Sparse Matrix Multiplication. In Proceedings of the International Conference for High Perfor-
mance Computing, Networking, Storage and Analysis (SC ’17). ACM, Article 47, 14 pages.
[205] Edgar Solomonik, Erin Carson, Nicholas Knight, and James Demmel. 2014. Tradeoffs Between Synchronization,
Communication, and Work in Parallel Linear Algebra Computations. Technical Report.
[206] Edgar Solomonik and Torsten Hoefler. 2015. Sparse Tensor Algebra as a Parallel Programming Model. (2015).
arXiv:1512.00066
[207] Linghao Song, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2018. GraphR: Accelerating Graph Processing
Using ReRAM. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA ’18).
IEEE, 531–543.
[208] Stardog Union. 2018. Stardog. Available at https://www.stardog.com/. (2018).
[209] Benjamin A. Steer, Alhamza Alnaimi, Marco A. B. F. G. Lotz, Felix Cuadrado, Luis M. Vaquero, and Joan Varvenne.
2017. Cytosm: Declarative Property Graph Queries Without Data Migration. In Proceedings of the Fifth International
Workshop on Graph Data-Management Experiences & Systems (GRADES ’17). ACM, Article 4, 6 pages.
[210] Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, and Guotong Xie. 2015. SQLGraph:
An Efficient Relational-Based Property Graph Store. In Proceedings of the International Conference on Management of
Data (SIGMOD ’15). ACM, 1887–1901.
[211] Narayanan Sundaram, Nadathur Satish, Md Mostofa Ali Patwary, Subramanya R Dulloor, Michael J Anderson,
Satya Gautam Vadlamudi, Dipankar Das, and Pradeep Dubey. 2015. GraphMat: High Performance Graph Analytics
Made Productive. Proc. VLDB Endow. 8, 11 (July 2015), 1214–1225.
[212] Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton, Marcus Paradies, Moritz Kaufmann, Orri Erling,
Peter Boncz, Vlad Haprian, and János Benjamin Antal. 2018. An Early Look at the LDBC Social Network Benchmark’s
Business Intelligence Workload. In Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data
Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) (GRADES-NDA ’18). Article 9,
11 pages.
[213] Gábor Szárnyas, Jack Waudby, Benjamin A. Steer, Dávid Szakállas, Altan Birler, Mingxi Wu, Yuchen Zhang, and Peter
Boncz. 2022. The LDBC Social Network Benchmark: Business Intelligence Workload. Proc. VLDB Endow. 16, 4 (Dec.
2022), 877–890.
[214] Ruben Taelman. 2022. RDF String Turtle. https://github.com/rubensworks/rdf-string-ttl.js. (2022).
[215] Adrian Tate, Amir Kamil, Anshu Dubey, Armin Größlinger, Brad Chamberlain, Brice Goglin, H. Edwards, Chris
Newburn, David Padua, Didem Unat, Emmanuel Jeannot, Frank Hannig, Tobias Gysi, Hatem Ltaief, James Sexton,
Jesús Labarta, John Shalf, Karl Fürlinger, Kathryn O’Brien, and Vitus Leung. 2014. Programming Abstractions for Data
Locality. In PADAL Workshop.
[216] Daniel ten Wolde, Tavneet Singh, Gábor Szárnyas, and Peter Boncz. 2023. DuckPGQ: Efficient Property Graph Queries
in an analytical RDBMS. In Proceedings of the Conference on Innovative Data Systems Research (CIDR ’23).
[217] Yuanyuan Tian, En Liang Xu, Wei Zhao, Mir Hamid Pirahesh, Sui Jun Tong, Wen Sun, Thomas Kolanko, Md.
Shahidul Haque Apu, and Huijuan Peng. 2020. IBM Db2 Graph: Supporting Synergistic and Retrofittable Graph Queries
Inside IBM Db2. In Proceedings of the International Conference on Management of Data (SIGMOD ’20). ACM, 345–359.
[218] TigerGraph Inc. 2018. TigerGraph. Available at https://www.tigergraph.com/. (2018).
[219] Twitter. 2010. FlockDB. Available at https://github.com/twitter-archive/flockdb. (2010).
[220] Aparna Vaikuntam and Vinodh Kumar Perumal. 2014. Evaluation of Contemporary Graph Databases. In Proceedings
of the 7th India Computing Conference (COMPUTE ’14). ACM, Article 6, 10 pages.
[221] Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. 2016. PGQL: A Property Graph Query
Language. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems
(GRADES ’16). ACM, Article 7, 6 pages.
[222] VelocityDB Inc. 2019. VelocityDB. Available at https://velocitydb.com/. (2019).
[223] VelocityDB Inc. 2019. VelocityGraph. Available at https://velocitydb.com/QuickStartVelocityGraph. (2019).
[224] WhiteDB Team. 2013. WhiteDB. Available at http://whitedb.org/ or https://github.com/priitj/whitedb. (2013).
45
[225] Konstantinos Xirogiannopoulos, Virinchi Srinivas, and Amol Deshpande. 2017. GraphGen: Adaptive Graph Processing
Using Relational Databases. In Proceedings of the Fifth International Workshop on Graph Data-Management Experiences
& Systems (GRADES ’17). ACM, Article 9, 7 pages.
[226] Da Yan, Yingyi Bu, Yuanyuan Tian, Amol Deshpande, and James Cheng. 2016. Big Graph Analytics Systems. In
Proceedings of the International Conference on Management of Data (SIGMOD ’16). ACM, 2241–2243.
[227] Robert Yokota. 2016. HGraphDB. Available at https://github.com/rayokota/hgraphdb. (2016).
[228] Pingpeng Yuan, Pu Liu, Buwen Wu, Hai Jin, Wenya Zhang, and Ling Liu. 2013. TripleBit: A Fast and Compact System
for Large Scale RDF Data. Proc. VLDB Endow. 6, 7 (May 2013), 517–528.
[229] Kangfei Zhao and Jeffrey Xu Yu. 2017. All-in-One: Graph Processing in RDBMSs Revisited. In Proceedings of the
International Conference on Management of Data (SIGMOD ’17). ACM, 1165–1180.
[230] Xiaowei Zhu, Guanyu Feng, Marco Serafini, Xiaosong Ma, Jiping Yu, Lei Xie, Ashraf Aboulnaga, and Wenguang Chen.
2020. LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans. Proc. VLDB
Endow. 13, 7 (March 2020), 1020–1034.
[231] Lei Zou, M. Tamer Özsu, Lei Chen, Xuchuan Shen, Ruizhe Huang, and Dongyan Zhao. 2014. GStore: A Graph-Based
SPARQL Query Engine. The VLDB Journal 23, 4 (Aug. 2014), 565–590.
46