Documentation available also in PDF format: ArcadeDB-Manual.pdf.
The Next Generation Multi-Model Database Management System. Browse the topics in the sidebar or explore the highlights below.
Multi-Model
Graph, Document, Key/Value, Search, Time Series, Vector, and Geospatial in one engine.
Multi-Language
SQL, Cypher, Gremlin, GraphQL, MongoDB QL, and Redis commands.
Multi-Protocol
HTTP/JSON, PostgreSQL, MongoDB, and Redis wire protocols.
Performance
Built for speed with minimal GC pressure and efficient storage.
ACID Transactions
Full transaction support with isolation and durability guarantees.
High Availability
Leader-follower replication with automatic failover.
- Tutorials
-
Step-by-step guides to get you started with ArcadeDB.
- How-To Guides
-
Practical recipes for specific tasks and integrations.
- Explanations
-
Background concepts, architecture, and design decisions.
- Reference
-
Complete technical reference for languages, APIs, and configuration.
1. Quick Reference
-
Getting Started: Run ArcadeDB with Docker, Multi-Model, Java Tutorial
-
Data Models: Graph, Document, Key/Value, Search Engine, Time Series, Vector, Geospatial
-
Query Languages: SQL, Cypher, Gremlin, MongoDB QL
-
Connectivity: Embedded, HTTP/JSON API, MCP Server, PostgreSQL, MongoDB, Redis
-
Tools: Console, Docker, Kubernetes, OrientDB Importer
-
Links: Website, Source, Latest Release (26.3.1), Pricing & Support
2. Tutorials
2.1. What is ArcadeDB?
ArcadeDB is the a generation of DBMS (DataBase Management System) that runs pretty much on every hardware/software configuration. ArcadeDB is multi-model, which means it can work with graphs, documents as well as other models of data, and doing so extremely fast.
How can it be so fast?
ArcadeDB is written in LLJ ("Low-Level-Java"), that means it’s written in Java (Java21+), but without using a high-level API. The result is that ArcadeDB does not allocate many objects at run-time on the heap, so the garbage collection does not need to act regularly, only rarely. At the same time, it is highly portable and leverages the hyper optimized Java Virtual Machine*. Furthermore, the kernel is built to be efficient on multi-core CPUs by using novel mechanical sympathy techniques.
ArcadeDB is a native graph database:
-
No more "Joins": relationships are physical links to records
-
Traverses parts of, or entire trees and graphs of records in milliseconds
-
Traversing speed is independent from the database size
Cloud DBMS
ArcadeDB was born in the cloud. Even though you can run ArcadeDB as embedded and in an on-premise setup, you can spin an ArcadeDB server/cluster in a few seconds with Docker, Kubernetes, Amazon AWS (coming soon), or Microsoft Azure (coming soon).
Is ArcadeDB FREE?
ArcadeDB Community Edition is really FREE for any purpose and thus released under the Apache 2.0 license. We love to know about your project with ArcadeDB and any contributions back to ArcadeDB’s open community (reports, patches, test cases, documentations, etc) are welcome.
Ask yourself: which is more likely to have better quality? A DBMS created and tested by a handful of developers in isolation, or one tested by thousands of developers globally? When code is public, everyone can scrutinize, test, report and resolve issues. All things open source moves faster compared to the proprietary world.
Professional Support
2.2. Run ArcadeDB
You can run ArcadeDB in the following ways:
-
In the cloud, by running an ArcadeDB instance on Amazon AWS, Microsoft Azure, or Google Cloud Engine.
-
On-premise, on your servers, any OS is good. You can run with Docker, Podman, Kubernetes or bare metal.
-
On x86(-64), arm(64), or any other hardware supporting a JRE (Java* Runtime Environment)
-
Embedded, if you develop with a language that runs on the JVM (Java* Virtual Machine)*
To reach the best performance, use ArcadeDB in embedded mode to reach two million insertions per second on common hardware. If you need to scale up with the queries, run a HA (high availability) configuration with at least three servers, and a load balancer in front. Run ArcadeDB with Kubernetes to have an automatic setup of servers in HA and a load balancer upfront.
Embedded
This mode is possible only if your application is running in a JVM* (Java* Virtual Machine). In this configuration ArcadeDB runs in the same JVM as your application. In this way you completely avoid the client/server communication cost (TCP/IP, marshalling/unmarshalling, etc.) If the JVM that hosts your application crashes, then also ArcadeDB would crash, but don’t worry, ArcadeDB uses a WAL to recover partially committed transactions. Your data is safe! Check the Embedded Server section.
Client-Server
This is the classic way people use a DBMS, like with relational databases. The ArcadeDB server exposes HTTP/JSON API, so you can connect to ArcadeDB from any language without even using drivers. Take a look at the API and Driver Reference chapter for more information.
High Availability (HA)
You can spin up as many ArcadeDB servers as you want to have a HA setup and scale up with queries that can be executed on any servers. ArcadeDB uses a Raft based election system to guarantee the consistency of the database. For more information look at High Availability.
2.2.1. Getting Started
See Installation for guides to install under Linux, MacOS, Windows. Alternatively, you can run ArcadeDB using Docker:
docker run --rm -p 2424:2424 -p 2480:2480 -p 5432:5432 \
--name arcadedb arcadedata/arcadedb:26.3.1
2.3. Multi Model
The ArcadeDB engine supports Graph, Document, Key/Value, Search-Engine, Time-Series, and Vector-Embedding models, so you can use ArcadeDB as a replacement for a product in any of these categories. However, the main reason why users choose ArcadeDB is because of its true Multi-Model DBMS ability, which combines all the features of the above models into one core. This is not just interfaces to the database engine, but rather the engine itself was built to support all models. This is also the main difference to other multi-model DBMSs, as they implement an additional layer with an API, which mimics additional models. However, under the hood, they’re truly only one model, therefore they are limited in speed and scalability.
2.3.1. Graph Model
A graph represents a network-like structure consisting of Vertices (also known as Nodes) interconnected by Edges (also known as Arcs). ArcadeDB’s graph model is represented by the concept of a property graph, which defines the following:
-
Vertex - an entity that can be linked with other vertices and has the following mandatory properties:
-
unique identifier
-
set of incoming edges
-
set of outgoing edges
-
label that defines the type of vertex
-
-
Edge - an entity that links two vertices and has the following mandatory properties:
-
unique identifier
-
link to an incoming vertex (also known as head)
-
link to an outgoing vertex (also known as tail)
-
label that defines the type of connection/relationship between head and tail vertex
-
In addition to mandatory properties, each vertex or edge can also hold a set of custom properties. These properties can be defined by users, which can make vertices and edges appear similar to documents. Furthermore, edges are sorted by the reverse order of insertion, meaning the last edge added is the first when listed, cf. "Last In First Out".
In the table below, you can find a comparison between the graph model, the relational data model, and the ArcadeDB graph model:
| Relational Model | Graph Model | ArcadeDB Graph Model |
|---|---|---|
Table |
Vertex and Edge Types |
Type |
Row |
Vertex |
Vertex |
Column |
Vertex and Edge property |
Vertex and Edge property |
Relationship |
Edge |
Edge |
2.3.2. Document Model
The data in this model is stored inside documents. A document is a set of key/value pairs (also referred to as fields or properties), where the key allows access to its value. Values can hold primitive data types, embedded documents, or arrays of other values. Documents are not typically forced to have a schema, which can be advantageous, because they remain flexible and easy to modify. Documents are stored in collections, enabling developers to group data as they decide. ArcadeDB uses the concepts of "Types" and "Buckets" as its form of "collections" for grouping documents. This provides several benefits, which we will discuss in further sections of the documentation.
ArcadeDB’s document model also adds the concept of a "Relationship" between documents. With ArcadeDB, you can decide whether to embed documents or link to them directly. When you fetch a document, all the links are automatically resolved by ArcadeDB. This is a major difference to other document databases, like MongoDB or CouchDB, where the developer must handle any and all relationships between the documents themself.
The table below illustrates the comparison between the relational model, the document model, and the ArcadeDB document model:
| Relational Model | Document Model | ArcadeDB Document Model |
|---|---|---|
Table |
Collection |
|
Row |
Document |
Document |
Column |
Key/value pair |
Document property |
Relationship |
not available |
2.3.3. Key/Value Model
This is the simplest model. Everything in the database can be reached by a key, where the values can be simple and complex types. ArcadeDB supports documents and graph elements as values allowing for a richer model, than what you would normally find in the typical key/value model. The usual Key/Value model provides "buckets" to group key/value pairs in different containers. The most typical use cases of the Key/Value Model are:
-
POST the value as payload of the HTTP call →
/<bucket>/<key> -
GET the value as payload from the HTTP call →
/<bucket>/<key> -
DELETE the value by Key, by calling the HTTP call →
/<bucket>/<key>
The table below illustrates the comparison between the relational model, the Key/Value model, and the ArcadeDB Key/Value model:
| Relational Model | Key/Value Model | ArcadeDB Key/Value Model |
|---|---|---|
Table |
Bucket |
|
Row |
Key/Value pair |
Document |
Column |
not available |
Document field or Vertex/Edge property |
Relationship |
not available |
2.3.4. Search-Engine Model
The search engine model is based on a full-text variant of the LSM-Tree index. To index each word, the necessary tokenization is performed by the Apache Lucene library. Such a full-text index is created just like any index in ArcadeDB.
2.3.5. Time-Series Model
ArcadeDB includes a native Time Series engine designed for high-throughput ingestion and fast analytical queries over timestamped data. The Time Series model is integrated directly into the multi-model core — the same database that stores graphs, documents, and key/value pairs can store and query billions of time-stamped samples with specialized columnar compression, SIMD-vectorized aggregation, and automatic lifecycle management.
Key capabilities:
-
Columnar storage with Gorilla, Delta-of-Delta, Simple-8b, and Dictionary compression — as low as 0.4 bytes per sample
-
Shard-per-core parallelism with lock-free writes
-
InfluxDB Line Protocol ingestion for compatibility with Telegraf, Grafana Agent, and hundreds of collection agents
-
Prometheus remote_write / remote_read protocol for drop-in Prometheus backend usage
-
PromQL query language with native parser and HTTP-compatible API endpoints
-
SQL analytical functions —
ts.timeBucket,ts.rate,ts.percentile,ts.interpolate, and more -
Continuous aggregates, retention policies, and downsampling tiers for automatic data lifecycle
-
Grafana integration via DataFrame-compatible endpoints
-
Studio TimeSeries Explorer with query, schema inspection, ingestion docs, and PromQL tabs
For the full reference, including DDL syntax, ingestion methods, SQL functions, PromQL support, and Grafana integration, see Time Series.
2.3.6. Vector Model
ArcadeDB provides a high-performance vector indexing solution using the LSMVectorIndex, which is built on the JVector 4.0.0 library. This implements the hierarchical navigable small world (HNSW) algorithm for efficient approximate nearest neighbor (ANN) search on multi-dimensional vector data.
The LSMVectorIndex combines the HNSW algorithm with ArcadeDB’s LSM Tree architecture, providing:
-
Persistent Storage: Vector indexes are stored on disk with automatic page management and compaction
-
Transaction Support: Full ACID compliance with automatic persistence on transaction commit
-
Multiple Similarity Functions: Supports COSINE (default), DOT_PRODUCT, EUCLIDEAN, and others
-
SQL Integration: Create and query vector indexes using SQL commands
-
Automatic Compaction: Efficiently reclaims disk space through automatic compaction of immutable pages
-
High Performance: LSM Tree benefits for write efficiency and space optimization at scale
SQL Example
Create a vector index and query it:
-- Create vertex type and property
CREATE VERTEX TYPE Document;
CREATE PROPERTY Document.content STRING;
CREATE PROPERTY Document.embedding ARRAY_OF_FLOATS;
-- Create vector index with 384 dimensions using COSINE similarity
CREATE INDEX ON Document (embedding) LSM_VECTOR METADATA {
dimensions: 384,
similarity: 'COSINE'
};
-- Query for the 10 nearest documents
-- Returns rows with .record (full document) and .distance (0 = identical for COSINE)
SELECT expand(vectorNeighbors('Document[embedding]', $queryVector, 10))
Java Example
Create and query a vector index programmatically:
import com.arcadedb.index.lsm.LSMVectorIndex;
import com.arcadedb.index.lsm.LSMVectorIndexBuilder;
import com.arcadedb.index.vector.VectorSimilarityFunction;
// Create index programmatically
final LSMVectorIndexBuilder builder = new LSMVectorIndexBuilder(
database,
"Document",
new String[]{"embedding"})
.withDimensions(384)
.withSimilarity(VectorSimilarityFunction.COSINE)
.withMaxConnections(16)
.withBeamWidth(100);
final LSMVectorIndex index = builder.create();
// Query the index using SQL
final ResultSet resultSet = database.query("sql",
"SELECT expand(vectorNeighbors('Document[embedding]', ?, 10))",
queryVector);
Configuration Parameters
When creating LSMVectorIndex instances, the following parameters can be configured:
-
dimensions: The dimensionality of the vectors (must match your embedding model output) -
similarity: The distance function for similarity calculation (COSINE, DOT_PRODUCT, EUCLIDEAN, etc.) -
maxConnections: Maximum number of connections per layer in the HNSW graph (default: 16, increase for better recall) -
beamWidth: Beam width for approximate nearest neighbor search (default: 100, increase for more accurate results)
Supported Similarity Functions
| Measure | Name | Type |
|---|---|---|
|
L2 |
|
|
L2 |
|
|
L2 |
For more information on vector embeddings, see the Vector Embeddings section.
2.5. Embedded Server
Embedding the server in your JVM allows to have all the benefits of working in embedded mode with ArcadeDB (zero cost for network transport and marshalling) and still having the database accessible from the outside, such as Studio, remote API, Postgres, REDIS and MongoDB drivers.
We call this configuration an "ArcadeDB Box".
First, add the server library in your classpath.
If you’re using Maven include this dependency in your pom.xml file.
<dependency>
<groupId>com.arcadedb</groupId>
<artifactId>arcadedb-server</artifactId>
<version>26.3.1</version>
</dependency>
This library depends on arcadedb-network-<version>.jar.
If you’re using Maven or Gradle, it will be imported automatically as a dependency, otherwise please add also the arcadedb-network library to your classpath.
The arcadedb-server dependency will only start the ArcadeDB server. You will see the HTTP URL for the server along with the port number displayed, for example, http://localhost:2480. However, if you try to access this URL to see the ArcadeDB studio, you will receive a "Not Found" message. This is because the arcadedb-server dependency only adds the embedded server.
If you need to access the ArcadeDB studio to execute graph database queries, then you will need to add the following dependency:
|
<dependency>
<groupId>com.arcadedb</groupId>
<artifactId>arcadedb-studio</artifactId>
<version>26.3.1</version>
</dependency>
2.5.1. Java 17 notes
Java 17 packages are available through GitHub package. To use them, you need to add the GitHub package repository to your pom.xml file:
<repositories>
<repository>
<id>github</id>
<url>https://maven.pkg.github.com/ArcadeData/arcadedb</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
Then, you can add the arcadedb-server dependency with the -java17 suffix to use the Java 17 compatible version:
<dependency>
<groupId>com.arcadedb</groupId>
<artifactId>arcadedb-server</artifactId>
<version>26.3.1-java17</version>
</dependency>
2.5.2. Start the server in the JVM
To start a server as embedded, create it with an empty configuration, so all the setting will be the default ones:
ContextConfiguration config = new ContextConfiguration();
ArcadeDBServer server = new ArcadeDBServer(config);
server.start();
To start a server in distributed configuration (with replicas), you can set your settings in the ContextConfiguration:
config.setValue(GlobalConfiguration.HA_SERVER_LIST, "192.168.10.1,192.168.10.2,192.168.10.3");
config.setValue(GlobalConfiguration.HA_REPLICATION_INCOMING_HOST, "0.0.0.0");
config.setValue(GlobalConfiguration.HA_ENABLED, true);
When you embed the server, you should always get the database instance from the server itself.
This assures the database instance is just one in the entire JVM.
If you try to create or open another database instance from the DatabaseFactory, you will receive an error that the underlying database is locked by another process.
Database database = server.getDatabase(<URL>);
Or this if you want to create a new database if not exists:
Database database = server.getOrCreateDatabase(<URL>);
2.5.3. Create custom HTTP commands
You can easily add custom HTTP commands on ArcadeDB’s Undertow HTTP Server by creating a Server Plugin (look at MongoDBProtocolPlugin plugin implementation for a real example) and implementing the registerAPI method.
Example for the HTTP POST API /myapi/test:
package com.yourpackage;
public class MyTest implements ServerPlugin {
// ...
@Override
public void registerAPI(HttpServer httpServer, final PathHandler routes) {
routes.addPrefixPath("/myapi",//
Handlers.routing()//
.get("/account/{id}", new RetieveeAccount(this))// YOU CAN ADD YOUR HANDLERS UNDER THE SAME PREFIX PATH
.post("/test/{name}", new MyTestAPI(this))//
);
}
}
You can use GET, POST or any HTTP methods when you register your handler.
Note that multiple handlers are defined under the same prefix /myapi.
Below you can find the implementation of a "Test" handler that can be called by using the HTTP POST method against the URL /myapi/test/{name} where {name} is the name passed as an argument.
Note that the MyTestAPI class is inheriting DatabaseAbstractHandler to have the database instance as a parameter.
If the user is not authenticated, the execute() method is not called at all, but an authentication error is returned. If you don’t need to access to the database, then you can extend the AbstractHandler class instead.
public class MyTestAPI extends DatabaseAbstractHandler {
public MyTestAPI(final HttpServer httpServer) {
super(httpServer);
}
@Override
public void execute(final HttpServerExchange exchange, ServerSecurityUser user, final Database database) throws IOException {
final Deque<String> namePar = exchange.getQueryParameters().get("name");
if (namePar == null || namePar.isEmpty()) {
exchange.setStatusCode(400);
exchange.getResponseSender().send("{ \"error\" : \"name is null\"}");
return;
}
final String name = namePar.getFirst();
// DO SOMETHING MEANINGFUL HERE
// ...
exchange.setStatusCode(204);
exchange.getResponseSender().send("");
}
}
At startup, ArcadeDB server will initiate your plugin and register your API.
To start the server with your plugin, register the full class in
arcadedb.server.plugins setting:
Example:
$ java ... -Darcadedb.server.plugins=MyPlugin:com.yourpackage.MyPlugin ...
2.5.4. HTTPS connection
In order to enable HTTPS on ArcadeDB server, you have to set the following configuration before the server starts:
configuration.setValue(GlobalConfiguration.NETWORK_USE_SSL, true);
configuration.setValue(GlobalConfiguration.NETWORK_SSL_KEYSTORE, "src/test/resources/master.jks");
configuration.setValue(GlobalConfiguration.NETWORK_SSL_KEYSTORE_PASSWORD, "keypassword");
configuration.setValue(GlobalConfiguration.NETWORK_SSL_TRUSTSTORE, "src/test/resources/master.jks");
configuration.setValue(GlobalConfiguration.NETWORK_SSL_TRUSTSTORE_PASSWORD, "storepassword");
Where:
-
NETWORK_USE_SSLenable the SSL support for the HTTP Server -
NETWORK_SSL_KEYSTOREis the path where is located the keystore file -
NETWORK_SSL_KEYSTORE_PASSWORDis the keystore password -
NETWORK_SSL_TRUSTSTOREis the path where is located the truststore file -
NETWORK_SSL_TRUSTSTORE_PASSWORDis the truststore password
Note that the default port for HTTPs is configured via the global setting:
GlobalConfiguration.SERVER_HTTPS_INCOMING_PORT
And by default starts from 2490 to 2499 (increases the port if it’s already occupied).
| if HTTP or HTTPS port are already used, the next ports are used. With the default range of 2480-2489 for HTTP and 2490-2499 for HTTPS, if the port 2480 is not available, then the next port for both HTTP and HTTPS will be used, namely 2481 for HTTP and 2491 for HTTPS |
2.6. 10-Minute Tutorial (Embedded)
You can create a new database from scratch or open an existent one.
Most of the API works in both synchronous and asynchronous modes.
The asynchronous API are available from the <db>.async() object.
To start from scratch, let’s create a new database.
The entry point it’s the DatabaseFactory class that allows to create and open a database.
DatabaseFactory databaseFactory = new DatabaseFactory("/databases/mydb");
Pass the path in the file system where you want the database to be stored.
In this case a new directory 'mydb' will be created under the path /databases/ of your file system.
You can also use a relative path like databases/mydb.
A DatabaseFactory object doesn’t hold the Database instances.
It’s up to you to close them once you have finished.
|
2.6.1. Create a new database
To create a new database from scratch, use the .create() method in DatabaseFactory class.
If the database already exists, an exception is thrown.
Syntax:
DatabaseFactory databaseFactory = new DatabaseFactory("/databases/mydb");
try( Database db = databaseFactory.create(); ){
// YOUR CODE
}
The database instance db is ready to be used inside the try block.
The Database instance extends Java7 AutoClosable interface, that means the database is closed automatically when the Database variable reaches out of the scope.
2.6.2. Open an existent database
If you want to open an existent database, use the open() method instead:
DatabaseFactory databaseFactory = new DatabaseFactory("/databases/mydb");
try( Database db = databaseFactory.open(); ){
// YOUR CODE
}
By default a database is open in READ_WRITE mode, but you can open it in READ_ONLY in this way:
databaseFactory.open(PaginatedFile.MODE.READ_ONLY);
Using READ_ONLY denys any changes to the database.
This is the suggested method if you’re going to execute reads and queries only.
Or if you are opening a database from a read-only file system like a DVD or a shared read-only directory.
By letting know to ArcadeDB that you’re not changing the database, a lot of optimizations will be used, like in a distributed high-available configuration a REPLICA server could be used instead of the busy MASTER.
If you open a database in READ_ONLY mode, no lock file is created, so the same database could be opened in READ_ONLY mode by another process at the same time.
2.6.3. Write your first transaction
Either if you create or open a database, in order to use it, you have to execute your code inside a transaction, in this way:
try( Database db = databaseFactory.open(); ){
db.transaction( (tx) -> {
// YOUR CODE HERE
});
}
Using the database’s auto-close and the transaction() method allows to forget to manage begin/commit/rollback/close operations like you would do with a normal DBMS.
Anyway, you can control the transaction with explicit methods if you prefer.
This code block is equivalent to the previous one:
Database db = databaseFactory.open();
try {
db.begin();
// YOUR CHANGES HERE
db.commit();
} catch (Exception e) {
db.rollback();
} finally {
db.close();
}
Remember that every change in the database must be executed inside a transaction.
ArcadeDB is a fully transactional DBMS, ACID compliant.
The usage of transactions is like with a Relational DBMS: .begin() starts a new transaction and .commit() commits all the changes in the database unless there is an error (like a conflict on updating the same record), then the entire transaction will be automatically rollbacked and none of your changes will be in the database.
In case you want to manually rollback the transaction at a certain point (like when you have an error in your application code), you can call .rollback().
Once you have your database instance (in this tutorial the variable db is used), you can create/update/delete records and execute queries.
2.6.4. Write your first document object
Let’s start now populating the database by creating our first document of type "Customer". What is a document? A Document is like a map of entries. They can be nested and entries can have different types of values, such as Strings, Integers, Floats, etc. You can think to a document like a JSON Document but it’s stored in a binary form in the database. By the way, if you use JSON in your application, ArcadeDB provides easy API to convert a document to and from JSON.
In ArcadeDB it’s mandatory to specify a type when you want tot create a document, a vertex or an edge.
Let’s create the new document type "Customer" without any properties:
try( Database db = databaseFactory.open(); ){
db.transaction( () -> {
// CREATE THE CUSTOMER TYPE
db.getSchema().createDocumentType("Customer");
});
}
Once the "Customer" type has been created, we can create our first document:
try( Database db = databaseFactory.open(); ){
db.transaction( () -> {
// CREATE A CUSTOMER INSTANCE
MutableDocument customer = db.newDocument("Customer");
customer.set("name", "Jay");
customer.set("surname", "Miner");
customer.save(); // THE DOCUMENT IS SAVED IN THE DATABASE ONLY WHEN `.save()` IS CALLED
});
}
You can create types and records in the same transaction.
2.6.5. Execute a Query
Once we have our database populated, how to extract data from it? Simple, with a query. Example of executing a prepared query:
try( Database db = databaseFactory.open(); ){
db.transaction( () -> {
ResultSet result = db.query("SQL", "select from V where age > ? and city = ?", 18, "Melbourne");
while (result.hasNext()) {
Result record = result.next();
System.out.println( "Found record, name = " + record.getProperty("name"));
}
});
}
The first parameter of the query method is the language to be used. In this case the common "SQL" is used. You can also use Gremlin or other language that will be supported in the future.
The prepared statement is cached in the database, so further executions will be faster than the first one.
With prepared statements, the parameters can be passed in positional way, like in this case, or with a Map<String,Object> where the keys are the parameter names and the values the parameter values.
Example:
try( Database db = databaseFactory.open(); ){
db.transaction( () -> {
Map<String,Object> parameters = new HashMap<>();
parameters.put( "age", 18 );
parameters.put( "city", "Melbourne" );
ResultSet result = db.query("SQL", "select from V where age > :age and city = :city", parameters);
while (result.hasNext()) {
Result record = result.next();
System.out.println( "Found record, name = " + record.getProperty("name"));
}
});
}
By using a map, parameters are referenced by name (:age and :city in this example).
2.6.6. Create a Graph
Now that we’re familiar with the most basic operations, let’s see how to work with graphs. Before creating our vertices and edges, we have to create both vertex and edge types beforehand. In our example, we’re going to create a minimal social network with "User" type for vertices and "IsFriendOf" to map the friendship relationship:
try( Database db = databaseFactory.open(); ){
db.transaction( () -> {
// CREATE THE ACCOUNT TYPE
db.getSchema().createVertexType("User");
db.getSchema().createEdgeType("IsFriendOf");
});
}
Now let’s create two "Profile" vertices and let’s connect them with the friendship relationship "IsFriendOf", like in the chart below:
try( Database db = databaseFactory.open(); ){
db.transaction( () -> {
MutableVertex albert = db.newVertex("User").set("name", "Albert").set("lastName", "Einstein").save();
MutableVertex michelle = db.newVertex("User").set("name", "Michelle").set("lastName", "Besso").save();
albert.newEdge("IsFriendOf", michelle, true, "since", 2010);
});
}
In the code snipped above, we have just created our first graph, made of 2 vertices and one edge that connects them.
Vertices and documents are not persistent until you call the save() method.
Note the 3rd parameter in the newEdge() method.
It’s telling to the Graph engine that we want a bidirectional edge.
In this way, even if the direction is still from the "Albert" vertex to the "Michelle" vertex, we can traverse the edge from both sides.
Use always bidirectional unless you want to avoid creating super-nodes when it’s necessary to traverse only from one side.
Note also that we stored a property "since = 2010" in the edge.
That’s right, edges can have properties like vertices.
2.6.7. Traverse the Graph
What do you do with a brand new graph? Traversing, of course!
You have basically three ways to do that (Java API, SQL, Apache Gremlin and Open Cypher) each one with its pros/cons:
Speed |
* * * |
* * |
* * |
* * |
Flexibility |
* * * |
* |
* * |
* * |
Embedded mode |
Yes |
Yes |
Yes |
Yes |
Remote mode |
No |
Yes |
Yes (through the Gremlin Server plugin) |
Yes (through the Gremlin Server plugin) |
When using the API, when the SQL and Apache Gremlin?
The API is the very code based.
You have total control on the query/traversal.
With the SQL, you can combine the SELECT with the MATCH statement to create powerful traversals in a just few lines.
You could use Apache Gremlin if you’re coming from another GraphDB that supports this language.
Traverse via API
In order to start traversing a graph, you need your root vertex (in some cases you want to start from multiple root vertices). You can load your root vertex by its RID (Record ID), via the indexes properties or via a SQL query.
Loading a record by its RID it’s the fastest way and the execution time remains constants with the growing of the database (algorithm complexity: O(1)).
Example of lookup by RID:
try( Database db = databaseFactory.open(); ){
db.transaction( () -> {
// #10:232 in our example is Albert Einstein's RID
Vertex albert = db.lookupByRID( new RID(db, "#10:232"), true );
});
}
In order to have a quick lookup, it’s always suggested to create an index against one or multiple properties.
In our case, we could index the properties "name" and "lastName" with 2 separate indexes, or indeed, creating a composite index with both properties.
In this case the algorithm complexity is O(LogN)).
Example:
try( Database db = databaseFactory.open(); ){
db.transaction( () -> {
db.getSchema().createTypeIndex(SchemaImpl.INDEX_TYPE.LSM_TREE, false, "Profile", new String[] { "name", "lastName" });
});
}
Now we’re able to load Michelle’s vertex in a flash by using this:
try( Database db = databaseFactory.open(); ){
db.transaction( () -> {
Vertex michelle = db.lookupByKey( "Profile", new String[]{"name", "lastName"}, new String[]{"Michelle", "Besso"} );
});
}
Remember that loading a record by its RID is always faster than looking up from an index. What about the query approach? ArcadeDB supports SQL, so try this:
try( Database db = databaseFactory.open(); ){
db.transaction( () -> {
ResultSet result = db.query( "SQL", "select from Profile where name = ? and lastName = ?", "Michelle", "Besso" );
Vertex michelle = result.next();
});
}
With the query approach, if an existent index is available, then it’s automatically used, otherwise a scan is executed.
Now that we have loaded the root vertex in memory, we’re ready to do some traversal. Before looking at the API, it’s important to understand every edge has a direction: from vertex A to vertex B. In the example above, the direction of the friendship is from "Albert" to "Michelle". While in most of the cases the direction is important, sometimes, like with the friendship, it doesn’t really matter the direction because if A is friend with B, it’s true also the opposite.
In our example, the relationship is Albert ---Friend--→ Michelle.
This means that if I want to retrieve all Albert’s friends, I could start from the vertex "Albert" and traverse all the outgoing edges of type "IsFriendOf".
Instead, if I want to retrieve all Michelle’s friends, I could start from Michelle as root vertex and traverse all the incoming edges.
In case the direction doesn’t really matters (like with friendship), I could consider both outgoing and incoming.
So the basic traversal operations from one or more vertices, are:
-
outgoing, expressed as
OUT -
incoming, expressed as
IN -
both, expressed as
BOTH
In order to load Michelle’s friends, this is the example by using API:
try( Database db = databaseFactory.open(); ){
db.transaction( () -> {
Vertex michelle; // ALREADY LOADED VIA RID, KEYS OR SQL
Iterable<Vertex> friends = michelle.getVertices(DIRECTION.IN, "IsFriendOf" );
});
}
Instead, if I start from Albert’s vertex, it would be:
try( Database db = databaseFactory.open(); ){
db.transaction( () -> {
Vertex albert; // ALREADY LOADED VIA RID, KEYS OR SQL
Iterable<Vertex> friends = albert.getVertices(DIRECTION.OUT, "IsFriendOf");
});
}
Traverse via SQL
By using SQL, you can do the traversal by using SELECT:
try( Database db = databaseFactory.open(); ){
db.transaction( () -> {
ResultSet friends = db.query( "SQL", "SELECT expand( out('IsFriendOf') ) FROM Profile WHERE name = ? AND lastName = ?", "Michelle", "Besso" );
});
}
Or with the more powerful MATCH statement:
try( Database db = databaseFactory.open(); ){
db.transaction( () -> {
ResultSet friends = db.query( "SQL", "MATCH {type: Profile, as: Profile, where: (name = ? and lastName = ?)}.out('IsFriendOf') {as: Friend} RETURN Friend", "Michelle", "Besso" );
});
}
Traverse via Apache Gremlin
Since ArcadeDB is 100% compliant with Gremlin 3.7.x, you can run this query against the Apache Gremlin Server configured with ArcadeDB:
g.V().has('name','Michelle').has('lastName','Besso').out('IsFriendOf');
For more information about Apache Gremlin see: Gremlin API support
Traverse via Open Cypher
ArcadeDB supports also Open Cypher. The same query would be the following:
MATCH (me)-[:IsFriendOf]-(friend)
WHERE me.name = 'Michelle' and me.lastName = 'Besso'
RETURN friend.name, friend.lastName
For more information about Cypher see: Cypher support
2.7. 10-Minute Tutorial (Remote)
The ArcadeDB Server is accessible from the remote through the HTTP/JSON protocol. The protocol is very simple. For this reason, you don’t need a driver, because every modern programming language provides an easy way to execute HTTP requests and parse JSON.
For the examples in this tutorial we’re going to use curl.
Every request must be authenticated by passing user and password as HTTP basic authentication (in HTTP Headers).
In the examples below we’re going to always use "root" user with password "arcadedb-password".
Under Windows (Powershell) single and double quotes inside a single or double quoted string need to be replaced with their Unicode entity representations \u0022 (double quote) and
\u0027 (single quote).
This is for example the case in the data argument (-d) of POST requests.
|
Let’s first create an empty database "school" on the server:
$ curl -X POST http://localhost:2480/api/v1/server \
-d '{ "command": "create database school" }' \
-H "Content-Type: application/json" \
--user root:arcadedb-password
Now let’s create the type "Class":
$ curl -X POST http://localhost:2480/api/v1/command/school \
-d '{ "language": "sql", "command": "create document type Class"}' \
-H "Content-Type: application/json" \
--user root:arcadedb-password
We could insert our first Class by using SQL:
$ curl -X POST http://localhost:2480/api/v1/command/school \
-d '{ "language": "sql", "command": "insert into Class set name = '\''English'\'', location = '\''3rd floor'\''"}' \
-H "Content-Type: application/json" \
--user root:arcadedb-password
Or better, using parameters with SQL:
$ curl -X POST http://localhost:2480/api/v1/command/school \
-d '{ "language": "sql", "command": "insert into Class set name = :name, location = :location", "params": { "name": "English", "location": "3rd floor" }}' \
-H "Content-Type: application/json" \
--user root:arcadedb-password
For more detailed information about the HTTP/JSON protocol, see the HTTP/JSON API section.
2.8. Python Quickstart
This tutorial shows how to connect to ArcadeDB from Python using the PostgreSQL wire protocol. You will create a graph, run queries, and use vector search — all from a Python script.
2.8.1. Prerequisites
-
ArcadeDB running with PostgreSQL protocol enabled (default port 5432)
-
Python 3.10+
-
psycopglibrary
pip install "psycopg[binary]>=3.1,<4"
2.8.2. Connect to ArcadeDB
ArcadeDB speaks the PostgreSQL wire protocol, so you can use standard PostgreSQL drivers. No special SDK is needed.
import psycopg
conn = psycopg.connect(
host='localhost',
port=5432,
dbname='mydb',
user='root',
password='arcadedb',
autocommit=True,
)
print('Connected to ArcadeDB')
2.8.3. Create a Schema
Use SQL to create vertex and edge types:
with conn.cursor() as cur:
# Create vertex types
cur.execute("CREATE VERTEX TYPE Person IF NOT EXISTS")
cur.execute("CREATE VERTEX TYPE Movie IF NOT EXISTS")
# Create edge type
cur.execute("CREATE EDGE TYPE Acted IF NOT EXISTS")
# Insert data
cur.execute("CREATE VERTEX Person SET name = 'Alice', age = 30")
cur.execute("CREATE VERTEX Person SET name = 'Bob', age = 25")
cur.execute("CREATE VERTEX Movie SET title = 'The Matrix', year = 1999")
cur.execute("""
CREATE EDGE Acted FROM (SELECT FROM Person WHERE name = 'Alice')
TO (SELECT FROM Movie WHERE title = 'The Matrix')
SET role = 'Trinity'
""")
2.8.4. Query with SQL
with conn.cursor() as cur:
cur.execute("""
SELECT person.name, movie.title, acted.role
FROM MATCH {type: Person, as: person}
-Acted-> {type: Movie, as: movie}
""")
for row in cur.fetchall():
print(f'{row[0]} acted in {row[1]} as {row[2]}')
2.8.5. Query with Cypher
ArcadeDB supports Cypher queries through the same connection:
with conn.cursor() as cur:
cur.execute("{cypher} MATCH (p:Person)-[a:Acted]->(m:Movie) RETURN p.name, m.title, a.role")
for row in cur.fetchall():
print(f'{row[0]} acted in {row[1]} as {row[2]}')
Prefix Cypher queries with {cypher} when using the PostgreSQL protocol.
|
2.8.6. Vector Search
ArcadeDB supports vector similarity search. Add embeddings to your data and query by similarity:
with conn.cursor() as cur:
# Create a type with a vector property
cur.execute("CREATE VERTEX TYPE Product IF NOT EXISTS")
cur.execute("CREATE PROPERTY Product.embedding IF NOT EXISTS LIST")
cur.execute("CREATE INDEX ON Product (embedding) VECTOR (4, COSINE, LSM_VECTOR)")
# Insert products with embeddings
cur.execute("CREATE VERTEX Product SET name = 'Laptop', embedding = [0.9, 0.1, 0.8, 0.2]")
cur.execute("CREATE VERTEX Product SET name = 'Tablet', embedding = [0.8, 0.2, 0.7, 0.3]")
cur.execute("CREATE VERTEX Product SET name = 'Phone', embedding = [0.7, 0.3, 0.6, 0.4]")
cur.execute("CREATE VERTEX Product SET name = 'Book', embedding = [0.1, 0.9, 0.2, 0.8]")
# Find the 3 most similar products to a query vector
cur.execute("""
SELECT name, distance FROM (
SELECT expand(vectorNeighbors('Product[embedding]', [0.85, 0.15, 0.75, 0.25], 3))
)
""")
print('Similar products:')
for row in cur.fetchall():
print(f' {row[0]}')
2.8.7. Full Example
Here is a complete script that connects, creates data, and queries:
#!/usr/bin/env python3
"""ArcadeDB Python quickstart -- graph + vector search."""
import psycopg
def main():
conn = psycopg.connect(
host='localhost', port=5432, dbname='mydb',
user='root', password='arcadedb', autocommit=True,
)
with conn.cursor() as cur:
# Create schema
cur.execute("CREATE VERTEX TYPE Person IF NOT EXISTS")
cur.execute("CREATE VERTEX TYPE Movie IF NOT EXISTS")
cur.execute("CREATE EDGE TYPE Acted IF NOT EXISTS")
# Insert data
cur.execute("CREATE VERTEX Person SET name = 'Alice', age = 30")
cur.execute("CREATE VERTEX Movie SET title = 'The Matrix', year = 1999")
cur.execute("""
CREATE EDGE Acted FROM (SELECT FROM Person WHERE name = 'Alice')
TO (SELECT FROM Movie WHERE title = 'The Matrix')
SET role = 'Trinity'
""")
# Query with SQL MATCH
cur.execute("""
SELECT person.name, movie.title, acted.role
FROM MATCH {type: Person, as: person}
-Acted-> {type: Movie, as: movie}
""")
for row in cur.fetchall():
print(f'{row[0]} acted in {row[1]} as {row[2]}')
conn.close()
if __name__ == '__main__':
main()
2.8.8. Next Steps
-
IAM Use Case — Full Python implementation with 7 query patterns
2.9. JavaScript / TypeScript Quickstart
This tutorial shows how to connect to ArcadeDB from Node.js using the PostgreSQL wire protocol. You will create a graph, run queries, and use vector search — all from JavaScript.
2.9.1. Prerequisites
-
ArcadeDB running with PostgreSQL protocol enabled (default port 5432)
-
Node.js 18+
-
pglibrary
npm install pg
2.9.2. Connect to ArcadeDB
ArcadeDB speaks the PostgreSQL wire protocol, so you can use the standard pg client. No special SDK is needed.
const { Client } = require('pg');
const client = new Client({
host: 'localhost',
port: 5432,
database: 'mydb',
user: 'root',
password: 'arcadedb',
});
await client.connect();
console.log('Connected to ArcadeDB');
2.9.3. Create a Schema
Use SQL to create vertex and edge types:
// Create vertex types
await client.query('CREATE VERTEX TYPE Person IF NOT EXISTS');
await client.query('CREATE VERTEX TYPE Movie IF NOT EXISTS');
// Create edge type
await client.query('CREATE EDGE TYPE Acted IF NOT EXISTS');
// Insert data
await client.query("CREATE VERTEX Person SET name = 'Alice', age = 30");
await client.query("CREATE VERTEX Person SET name = 'Bob', age = 25");
await client.query("CREATE VERTEX Movie SET title = 'The Matrix', year = 1999");
await client.query(`
CREATE EDGE Acted FROM (SELECT FROM Person WHERE name = 'Alice')
TO (SELECT FROM Movie WHERE title = 'The Matrix')
SET role = 'Trinity'
`);
2.9.4. Query with SQL
const result = await client.query(`
SELECT person.name, movie.title, acted.role
FROM MATCH {type: Person, as: person}
-Acted-> {type: Movie, as: movie}
`);
for (const row of result.rows) {
console.log(`${row.name} acted in ${row.title} as ${row.role}`);
}
2.9.5. Query with Cypher
ArcadeDB supports Cypher queries through the same connection:
const result = await client.query(
'{cypher} MATCH (p:Person)-[a:Acted]->(m:Movie) RETURN p.name, m.title, a.role'
);
for (const row of result.rows) {
console.log(`${row['p.name']} acted in ${row['m.title']} as ${row['a.role']}`);
}
Prefix Cypher queries with {cypher} when using the PostgreSQL protocol.
|
2.9.6. Vector Search
Add embeddings to your data and query by similarity:
// Create a type with a vector property
await client.query('CREATE VERTEX TYPE Product IF NOT EXISTS');
await client.query('CREATE PROPERTY Product.embedding IF NOT EXISTS LIST');
await client.query('CREATE INDEX ON Product (embedding) VECTOR (4, COSINE, LSM_VECTOR)');
// Insert products with embeddings
await client.query("CREATE VERTEX Product SET name = 'Laptop', embedding = [0.9, 0.1, 0.8, 0.2]");
await client.query("CREATE VERTEX Product SET name = 'Tablet', embedding = [0.8, 0.2, 0.7, 0.3]");
await client.query("CREATE VERTEX Product SET name = 'Phone', embedding = [0.7, 0.3, 0.6, 0.4]");
await client.query("CREATE VERTEX Product SET name = 'Book', embedding = [0.1, 0.9, 0.2, 0.8]");
// Find the 3 most similar products to a query vector
const similar = await client.query(`
SELECT name, distance FROM (
SELECT expand(vectorNeighbors('Product[embedding]', [0.85, 0.15, 0.75, 0.25], 3))
)
`);
console.log('Similar products:');
for (const row of similar.rows) {
console.log(` ${row.name}`);
}
2.9.7. Full Example
#!/usr/bin/env node
/**
* ArcadeDB JavaScript quickstart -- graph + vector search.
*/
const { Client } = require('pg');
async function main() {
const client = new Client({
host: 'localhost', port: 5432, database: 'mydb',
user: 'root', password: 'arcadedb',
});
await client.connect();
// Create schema
await client.query('CREATE VERTEX TYPE Person IF NOT EXISTS');
await client.query('CREATE VERTEX TYPE Movie IF NOT EXISTS');
await client.query('CREATE EDGE TYPE Acted IF NOT EXISTS');
// Insert data
await client.query("CREATE VERTEX Person SET name = 'Alice', age = 30");
await client.query("CREATE VERTEX Movie SET title = 'The Matrix', year = 1999");
await client.query(`
CREATE EDGE Acted FROM (SELECT FROM Person WHERE name = 'Alice')
TO (SELECT FROM Movie WHERE title = 'The Matrix')
SET role = 'Trinity'
`);
// Query with SQL MATCH
const result = await client.query(`
SELECT person.name, movie.title, acted.role
FROM MATCH {type: Person, as: person}
-Acted-> {type: Movie, as: movie}
`);
for (const row of result.rows) {
console.log(`${row.name} acted in ${row.title} as ${row.role}`);
}
await client.end();
}
main().catch(console.error);
2.9.8. Next Steps
-
Supply Chain Use Case — Full JavaScript implementation with 7 query patterns
-
HTTP/JSON API — Alternative: use
fetch()with ArcadeDB’s REST API
2.10. Vector Search Tutorial
This tutorial walks you through building a semantic search system with ArcadeDB. You will create vector embeddings, index them, query by similarity, and combine vector search with graph traversal.
2.10.1. What You Will Build
A product catalog with semantic search: given a query like "portable computing device", find the most relevant products by embedding similarity rather than keyword matching.
2.10.2. Prerequisites
-
ArcadeDB running (Docker or binary install)
-
A way to send queries (Console, HTTP API, or a Python/JavaScript client)
2.10.3. Step 1: Create the Schema
Create a vertex type with a vector property:
CREATE VERTEX TYPE Product
CREATE PROPERTY Product.name STRING
CREATE PROPERTY Product.category STRING
CREATE PROPERTY Product.embedding LIST OF FLOAT
2.10.4. Step 2: Create a Vector Index
Create an LSM_VECTOR index on the embedding property. Specify the number of dimensions and the similarity metric:
CREATE INDEX ON Product (embedding) LSM_VECTOR METADATA {
dimensions: 4,
similarity: 'COSINE'
}
In production, embeddings are typically 384-1536 dimensions. This tutorial uses 4 dimensions for simplicity. For production workloads, add quantization: 'INT8' for significantly better search performance — see Why INT8 is faster below.
|
2.10.5. Step 3: Insert Data with Embeddings
Insert products with pre-computed embedding vectors:
CREATE VERTEX Product SET name = 'Laptop', category = 'Electronics', embedding = [0.9, 0.1, 0.8, 0.2]
CREATE VERTEX Product SET name = 'Tablet', category = 'Electronics', embedding = [0.85, 0.15, 0.75, 0.25]
CREATE VERTEX Product SET name = 'Smartphone', category = 'Electronics', embedding = [0.8, 0.2, 0.7, 0.3]
CREATE VERTEX Product SET name = 'Headphones', category = 'Electronics', embedding = [0.6, 0.4, 0.5, 0.5]
CREATE VERTEX Product SET name = 'Novel', category = 'Books', embedding = [0.1, 0.9, 0.2, 0.8]
CREATE VERTEX Product SET name = 'Textbook', category = 'Books', embedding = [0.2, 0.8, 0.3, 0.7]
CREATE VERTEX Product SET name = 'Running Shoes', category = 'Sports', embedding = [0.3, 0.5, 0.9, 0.1]
CREATE VERTEX Product SET name = 'Yoga Mat', category = 'Sports', embedding = [0.25, 0.55, 0.85, 0.15]
In a real application, you would generate embeddings using an external model such as OpenAI’s text-embedding-3-small (1536 dimensions) or Sentence Transformers' all-MiniLM-L6-v2 (384 dimensions).
|
2.10.6. Step 4: Query by Similarity
Find the 3 products most similar to a query vector:
SELECT name, category, distance FROM (
SELECT expand(vectorNeighbors('Product[embedding]', [0.88, 0.12, 0.78, 0.22], 3))
)
The vectorNeighbors() function returns a list of results — expand() flattens it into individual rows so you can access properties like name, category, and distance directly. The query vector [0.88, 0.12, 0.78, 0.22] is close to the electronics cluster. The results should return Laptop, Tablet, and Smartphone — the three most similar items by cosine similarity.
2.10.7. Step 5: Add Graph Relationships
Make the example more interesting by adding edges between products:
CREATE EDGE TYPE FREQUENTLY_BOUGHT_WITH
CREATE EDGE TYPE SIMILAR_TO
CREATE EDGE FREQUENTLY_BOUGHT_WITH
FROM (SELECT FROM Product WHERE name = 'Laptop')
TO (SELECT FROM Product WHERE name = 'Headphones')
CREATE EDGE SIMILAR_TO
FROM (SELECT FROM Product WHERE name = 'Laptop')
TO (SELECT FROM Product WHERE name = 'Tablet')
CREATE EDGE FREQUENTLY_BOUGHT_WITH
FROM (SELECT FROM Product WHERE name = 'Novel')
TO (SELECT FROM Product WHERE name = 'Textbook')
2.10.8. Step 6: Combine Vector Search with Graph Traversal
Find similar products, then expand recommendations through graph relationships:
-- Step 1: Find top 3 by vector similarity
SELECT name, category, distance FROM (
SELECT expand(vectorNeighbors('Product[embedding]', [0.88, 0.12, 0.78, 0.22], 3))
)
Then traverse from those results to find related products:
-- Step 2: Get products frequently bought with the top match
SELECT friend.name, friend.category
FROM MATCH {type: Product, as: product, where: (name = 'Laptop')}
-FREQUENTLY_BOUGHT_WITH-> {type: Product, as: friend}
This two-step pattern — vector search to find semantically similar items, then graph traversal to expand through relationships — is the foundation of the Graph RAG and Recommendation Engine patterns.
2.10.9. Step 7: Use Different Similarity Metrics
Create additional indexes with different metrics for comparison:
-- Euclidean distance (absolute distance in vector space)
CREATE PROPERTY Product.embedding_l2 LIST OF FLOAT
CREATE INDEX ON Product (embedding_l2) LSM_VECTOR METADATA {
dimensions: 4,
similarity: 'EUCLIDEAN'
}
-- Dot product (fastest for normalized vectors)
CREATE PROPERTY Product.embedding_dot LIST OF FLOAT
CREATE INDEX ON Product (embedding_dot) LSM_VECTOR METADATA {
dimensions: 4,
similarity: 'DOT_PRODUCT'
}
2.10.10. Step 8: Control Search Quality with efSearch
By default, ArcadeDB uses an adaptive search strategy that works well for most queries. You can override it per-query by passing efSearch as the 4th argument:
-- Higher efSearch for better recall (useful for critical queries)
SELECT name, category, distance FROM (
SELECT expand(vectorNeighbors('Product[embedding]', [0.88, 0.12, 0.78, 0.22], 3, 200))
)
See Adaptive efSearch for details on how the default strategy works.
2.10.11. Step 9: Enable Quantization for Large Datasets
For production datasets with many vectors, enable INT8 quantization to reduce memory by 75%:
CREATE VERTEX TYPE LargeProduct
CREATE PROPERTY LargeProduct.embedding ARRAY_OF_FLOATS
CREATE INDEX ON LargeProduct (embedding) LSM_VECTOR METADATA {
dimensions: 384,
similarity: 'COSINE',
quantization: 'INT8'
}
Queries work exactly the same — quantization is transparent:
SELECT name, distance FROM (
SELECT expand(vectorNeighbors('LargeProduct[embedding]', $queryVector, 10))
)
2.10.12. Next Steps
-
Vector Search Concepts — Architecture, algorithms, and parameter tuning
-
Vector Embeddings How-To — Production best practices
-
Recommendation Engine — Full use case with vector + graph + time-series
-
Graph RAG — Vector + graph for LLM retrieval augmentation
2.11. Time Series Tutorial
This tutorial walks you through ingesting and querying time series data with ArcadeDB. You will create a sensor monitoring system that ingests temperature readings and queries them with SQL aggregations, PromQL, and continuous aggregates.
2.11.1. Prerequisites
-
ArcadeDB running (Docker or binary install)
-
curlfor HTTP requests
2.11.2. Step 1: Create a TimeSeries Type
Create a type to store sensor readings with temperature, humidity, and pressure fields:
CREATE TIMESERIES TYPE SensorReading
TIMESTAMP ts PRECISION MILLISECOND
TAGS (sensor_id STRING, location STRING)
FIELDS (
temperature DOUBLE,
humidity DOUBLE,
pressure DOUBLE
)
SHARDS 4
-
TIMESTAMP — Every time series type needs a timestamp column
-
TAGS — Low-cardinality dimensions used for filtering (indexed automatically)
-
FIELDS — The measurement values
-
SHARDS — Parallel write/read partitions (default: CPU count)
2.11.3. Step 2: Ingest Data via InfluxDB Line Protocol
The fastest remote ingestion method. Send sensor readings using curl:
curl -X POST "http://localhost:2480/api/v1/ts/mydb/write?precision=ms" \
-u root:arcadedb \
-H "Content-Type: text/plain" \
--data-binary '
SensorReading,sensor_id=sensor-A,location=floor-1 temperature=22.5,humidity=65.0,pressure=1013.2 1708430400000
SensorReading,sensor_id=sensor-A,location=floor-1 temperature=22.7,humidity=64.5,pressure=1013.1 1708430460000
SensorReading,sensor_id=sensor-A,location=floor-1 temperature=23.1,humidity=63.0,pressure=1013.0 1708430520000
SensorReading,sensor_id=sensor-B,location=floor-2 temperature=19.8,humidity=70.0,pressure=1012.5 1708430400000
SensorReading,sensor_id=sensor-B,location=floor-2 temperature=19.5,humidity=71.0,pressure=1012.6 1708430460000
SensorReading,sensor_id=sensor-B,location=floor-2 temperature=19.2,humidity=72.0,pressure=1012.7 1708430520000
'
The format is: <type>,<tag>=<value> <field>=<value> <timestamp>
2.11.4. Step 3: Ingest Data via SQL
You can also insert using standard SQL:
INSERT INTO SensorReading
(ts, sensor_id, location, temperature, humidity, pressure)
VALUES
('2026-02-20T10:10:00Z', 'sensor-A', 'floor-1', 23.5, 62.0, 1012.9),
('2026-02-20T10:10:00Z', 'sensor-B', 'floor-2', 19.0, 73.0, 1012.8)
2.11.5. Step 4: Query with Time Range Filters
Time range conditions on the timestamp column are pushed down to the storage engine for efficient scans:
SELECT ts, sensor_id, temperature, humidity
FROM SensorReading
WHERE ts BETWEEN '2026-02-20T10:00:00Z' AND '2026-02-20T11:00:00Z'
AND sensor_id = 'sensor-A'
ORDER BY ts
2.11.6. Step 5: Aggregate with Time Bucketing
Use ts.timeBucket() to group data into time intervals:
SELECT ts.timeBucket('1h', ts) AS hour,
sensor_id,
avg(temperature) AS avg_temp,
max(temperature) AS max_temp,
min(temperature) AS min_temp,
count(*) AS samples
FROM SensorReading
WHERE ts BETWEEN '2026-02-20' AND '2026-02-21'
GROUP BY hour, sensor_id
ORDER BY hour
2.11.7. Step 6: Calculate Rates and Percentiles
-- Rate of change per minute
SELECT ts.timeBucket('5m', ts) AS window,
sensor_id,
ts.rate(temperature, ts) AS temp_change_per_sec
FROM SensorReading
GROUP BY window, sensor_id
-- 99th percentile temperature per hour
SELECT ts.timeBucket('1h', ts) AS hour,
ts.percentile(temperature, 0.99) AS p99_temp
FROM SensorReading
GROUP BY hour
2.11.8. Step 7: Fill Gaps with Interpolation
Sensor data often has gaps. Use ts.interpolate() to fill them:
SELECT ts.timeBucket('1m', ts) AS minute,
ts.interpolate(temperature, 'linear', ts) AS temp
FROM SensorReading
WHERE sensor_id = 'sensor-A'
AND ts BETWEEN '2026-02-20T10:00:00Z' AND '2026-02-20T11:00:00Z'
GROUP BY minute
Methods: 'zero' (fill with 0), 'prev' (forward fill), 'linear' (interpolate), 'none' (keep null).
2.11.9. Step 8: Query with PromQL
ArcadeDB includes a native PromQL evaluator. Query via HTTP:
# Instant query
curl "http://localhost:2480/ts/mydb/prom/api/v1/query?query=avg(SensorReading{sensor_id='sensor-A'})" \
-u root:arcadedb
# Range query with rate calculation
curl "http://localhost:2480/ts/mydb/prom/api/v1/query_range?\
query=rate(SensorReading{sensor_id='sensor-A'}[5m])&\
start=1708430400&end=1708434000&step=60" \
-u root:arcadedb
Or use PromQL from within SQL:
RETURN promql('avg(SensorReading{sensor_id="sensor-A"})')
2.11.10. Step 9: Create a Continuous Aggregate
Pre-compute hourly summaries that update automatically:
CREATE CONTINUOUS AGGREGATE hourly_temps AS
SELECT ts.timeBucket('1h', ts) AS hour,
sensor_id,
avg(temperature) AS avg_temp,
max(temperature) AS max_temp,
count(*) AS cnt
FROM SensorReading
GROUP BY hour, sensor_id
Query the aggregate like any table:
SELECT * FROM hourly_temps ORDER BY hour DESC LIMIT 10
New inserts into SensorReading automatically trigger incremental updates to the aggregate.
2.11.11. Step 10: Set Up Retention and Downsampling
Keep raw data for 90 days, then downsample:
ALTER TIMESERIES TYPE SensorReading
ADD DOWNSAMPLING POLICY
AFTER 7 DAYS GRANULARITY 1 MINUTES
AFTER 30 DAYS GRANULARITY 1 HOURS
Raw data is automatically downsampled by a background scheduler.
2.11.12. Next Steps
-
Time Series Concepts — Architecture, compression, HA, and full reference
-
Realtime Analytics Use Case — Complete IoT monitoring with Grafana
-
Time Series Quick Reference — Syntax cheat sheet
3. Use Cases
Explore real-world applications built with ArcadeDB. Each use case includes a complete, runnable project with Docker Compose, sample data, and queries in multiple languages.
Full source code: ArcadeDB Use Cases Repository
| Use Case | Description | Features |
|---|---|---|
Product recommendations via collaborative filtering and vector similarity |
Graph, Vectors, Time-series |
|
Academic research graph with co-authorship and citation networks |
Graph, Vectors, Full-text, Time-series |
|
Retrieval-augmented generation with knowledge graphs and vector search |
Graph, Vectors, Full-text, Bolt, LangChain4j |
|
Multi-signal fraud detection unifying graph, vector, and time-series |
Graph, Vectors, Time-series, Cypher |
|
IoT and service monitoring with Grafana dashboards |
Time-series, Graph, Cypher, PromQL |
|
Social analytics with materialized view dashboards |
Materialized views, Graph, Time-series |
|
Multi-tier supply chain visibility and traceability |
Graph, Vectors, Time-series, PostgreSQL, JavaScript |
|
Permission resolution through group/role hierarchies |
Graph, Time-series, Vectors, PostgreSQL, Python |
|
Unified customer view with identity resolution and churn prediction |
Graph, Documents, Vectors, Full-text, OpenCypher |
3.1. Recommendation Engine
Build intelligent product and content recommendations in a single multi-model database — no external recommendation service required. Graph traversal powers collaborative filtering through relationship patterns like User-PURCHASED→Product and User-WATCHED→Show, vector similarity drives content-based recommendations using vectorNeighbors() with LSM_VECTOR indexes on product and show embeddings, and time-series analysis detects trending items by aggregating interaction counts over time windows.
3.1.1. Architecture Overview
Vertices |
|
Edges |
|
Documents |
|
Users connect to products and shows through purchase, watch, and interaction edges. Each item carries a 4-dimensional embedding vector for content-based similarity. A ProductInteraction document type tracks interaction counts over time for trending analysis.
3.1.2. Key Queries
Collaborative Filtering — Find products purchased by users who bought the same items:
MATCH (u:User)-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(other:User)-[:PURCHASED]->(rec:Product)
WHERE u.name = 'Alice' AND NOT (u)-[:PURCHASED]->(rec)
RETURN rec.name, count(other) AS score ORDER BY score DESC
Vector Similarity Search — Find products similar to a given product by embedding:
SELECT name, category, distance FROM (
SELECT expand(vectorNeighbors('Product[embedding]', [0.9, 0.1, 0.8, 0.2], 5))
) ORDER BY distance
Trending Detection — Identify trending products by recent interaction volume:
SELECT productId, SUM(purchaseCount) AS total
FROM ProductInteraction
WHERE timestamp > date('2024-01-14', 'yyyy-MM-dd')
GROUP BY productId ORDER BY total DESC
3.1.3. Try It Yourself
git clone https://github.com/ArcadeData/arcadedb-usecases.git
cd arcadedb-usecases/recommendation-engine
docker compose up -d
./setup.sh
./queries/queries.sh
Full source: recommendation-engine on GitHub
3.2. Knowledge Graphs
Build a unified academic research system that integrates researchers, papers, institutions, and topics in a single database. Graph traversal drives multi-hop queries across co-authorship and citation networks, vector similarity enables semantic paper search via vectorNeighbors() on paper embeddings, full-text search supports keyword queries on abstracts using SEARCH_INDEX(), and time-series tracking monitors citation activity over time.
3.2.1. Architecture Overview
Vertices |
|
Edges |
|
Documents |
|
Papers carry 4-dimensional embedding vectors for semantic search and full-text indexed abstracts for keyword search. Citation activity is tracked as time-series documents.
3.2.2. Key Queries
Co-authorship Network — Discover collaborations between researchers:
MATCH (r:Researcher)-[:CO_AUTHORED]->(p:Paper)<-[:CO_AUTHORED]-(coauthor:Researcher)
WHERE r.name = 'Dr. Smith'
RETURN coauthor.name, collect(p.title) AS papers
Semantic Paper Search — Find papers similar to a research topic by embedding:
SELECT title, year, distance FROM (
SELECT expand(vectorNeighbors('Paper[embedding]', [0.8, 0.3, 0.7, 0.1], 5))
) ORDER BY distance
Full-Text Abstract Search — Keyword search across paper abstracts:
SELECT title, abstract FROM Paper
WHERE SEARCH_INDEX('Paper[abstract]', 'machine learning graph')
3.2.3. Try It Yourself
git clone https://github.com/ArcadeData/arcadedb-usecases.git
cd arcadedb-usecases/knowledge-graphs
docker compose up -d
./setup.sh
./queries/queries.sh
Full source: knowledge-graphs on GitHub
3.3. Graph RAG
Implement retrieval-augmented generation (RAG) that retrieves richer, more connected context for LLM queries — all within a single database. Graph traversal enables multi-hop entity bridging across knowledge graph relationships, vector similarity powers semantic chunk retrieval using vectorNeighbors() with LSM_VECTOR indexes, full-text search provides keyword-based chunk lookup, and Neo4j Bolt protocol compatibility on port 7687 supports LangChain4j integration.
3.3.1. Architecture Overview
Vertices |
|
Edges |
|
Document chunks carry embedding vectors and link to extracted entities through MENTIONS edges. Entities connect via RELATES_TO, enabling multi-hop discovery that bridges chunks from different documents through shared entity mentions.
3.3.2. Key Queries
Hybrid Vector + Graph Search — Find semantically similar chunks and expand through entity connections:
SELECT content, source, distance FROM (
SELECT expand(vectorNeighbors('Chunk[embedding]', [0.9, 0.1, 0.8, 0.2], 5))
)
Multi-Hop Entity Bridge — Discover related entities across documents:
MATCH (c:Chunk)-[:MENTIONS]->(e:Entity)-[:RELATES_TO*1..2]-(related:Entity)<-[:MENTIONS]-(other:Chunk)
WHERE c.source = 'quantum_computing.txt'
RETURN related.name, other.content, other.source
Composite Scoring — Combine vector distance with graph connectivity for ranked retrieval:
SELECT content, source,
(1.0 / (1.0 + distance)) * 0.7 + (entityCount / 5.0) * 0.3 AS compositeScore
FROM ChunkScores ORDER BY compositeScore DESC
3.3.3. Try It Yourself
git clone https://github.com/ArcadeData/arcadedb-usecases.git
cd arcadedb-usecases/graph-rag
docker compose up -d
./setup.sh
./queries/queries.sh
Full source: graph-rag on GitHub
3.4. Fraud Detection
Detect financial fraud by unifying four detection signals in a single multi-model database: graph traversal exposes organized fraud rings through shared identifier patterns such as devices, phones, and addresses; vector similarity flags behavioral anomalies using vectorCosineSimilarity() on transaction embeddings; time-series analysis catches velocity attacks through temporal transaction patterns; and document queries resolve synthetic identities by detecting duplicate SSNs.
3.4.1. Architecture Overview
Vertices |
|
Edges |
|
Accounts and customers connect through shared devices, phones, and addresses. Overlapping identifiers reveal fraud rings. Customers carry profile_embedding vectors and transactions carry behavior_embedding vectors for anomaly detection.
3.4.2. Key Queries
Fraud Ring Detection — Identify accounts sharing devices or phones:
MATCH (a1:Account)-[:USES_DEVICE]->(d:Device)<-[:USES_DEVICE]-(a2:Account)
WHERE a1 <> a2
RETURN a1.name, a2.name, d.deviceId AS sharedDevice
Synthetic Identity Detection — Find accounts sharing SSNs:
SELECT a1.name, a2.name, a1.ssn
FROM Account a1, Account a2
WHERE a1.ssn = a2.ssn AND a1 != a2
Behavioral Anomaly Detection — Compare transaction embeddings against baselines:
SELECT name, vectorCosineSimilarity(behavior_embedding, [0.1, 0.9, 0.1, 0.8]) AS similarity
FROM Transaction
WHERE vectorCosineSimilarity(behavior_embedding, [0.1, 0.9, 0.1, 0.8]) < 0.7
3.4.3. Try It Yourself
git clone https://github.com/ArcadeData/arcadedb-usecases.git
cd arcadedb-usecases/fraud-detection
docker compose up -d
./setup.sh
./queries/queries.sh
Full source: fraud-detection on GitHub
3.4.4. Related Documentation
-
Louvain Community Detection — Identify fraud rings as communities
-
Cycle Detection — Detect circular money flows
-
Dijkstra Shortest Path — Trace shortest paths between suspicious accounts
3.5. Realtime Analytics
Replace a fragmented stack of specialized monitoring tools with a single database that integrates time-series sensor data, service metrics, and infrastructure topology. Time-series native functions — ts.timeBucket(), ts.rate(), ts.percentile(), and ts.interpolate() — power sensor and service metric analysis, while graph traversal maps building topology and service dependencies, enabling multi-model correlation that combines graph-based entity identification with time-series aggregation in unified queries.
3.5.1. Architecture Overview
Time Series |
|
Vertices |
|
Edges |
|
Sensors are installed on floors within buildings; services run on servers and depend on other services. Time-series data links to graph topology, enabling queries like "show the temperature trend for all sensors on the 2nd floor of Building A."
3.5.2. Key Queries
Hourly Temperature Bucketing — Aggregate sensor readings by hour:
SELECT sensor_id, ts.timeBucket(timestamp, 'PT1H') AS hour,
AVG(temperature) AS avg_temp, MAX(temperature) AS max_temp
FROM SensorReading
GROUP BY sensor_id, hour ORDER BY hour
Service Request Rate — Calculate request rate per service:
SELECT service_id, ts.rate(timestamp, request_count, 'PT5M') AS req_per_5min
FROM ServiceMetrics
WHERE service_id = 'api-gateway'
Service Impact Analysis — Trace service dependencies for outage impact:
MATCH (s:Service {name: 'database'})<-[:DEPENDS_ON*1..3]-(dependent:Service)
RETURN dependent.name AS affected_service, length(path) AS hops
3.5.3. Try It Yourself
git clone https://github.com/ArcadeData/arcadedb-usecases.git
cd arcadedb-usecases/realtime-analytics
docker compose up -d
./setup.sh
./queries/queries.sh
Full source: realtime-analytics on GitHub
3.6. Social Network Analytics
Analyze social network dynamics — trending content, user influence, viral propagation, and engagement metrics — in a unified platform. Materialized views with three refresh modes (PERIODIC for trending dashboards, INCREMENTAL for post counts on commit, and MANUAL for on-demand influence scores) pre-compute key analytics, graph traversal powers viral spread chain analysis and community overlap detection, time-series tracking captures engagement metrics as historical snapshots, and polyglot query support lets you use SQL for views and aggregations alongside OpenCypher for graph traversals.
3.6.1. Architecture Overview
Vertices |
|
Edges |
|
Documents |
|
Users create and interact with posts, follow each other, and belong to groups. Engagement metrics are captured as time-series snapshots. Materialized views pre-compute trending scores, post counts, and influence rankings.
3.6.2. Key Queries
Trending Content Dashboard — Aggregate engagement into trending scores (via PERIODIC materialized view):
SELECT postId, SUM(likes + shares * 2 + comments * 3) AS trendingScore
FROM EngagementMetric
WHERE timestamp > date('2024-01-15', 'yyyy-MM-dd')
GROUP BY postId ORDER BY trendingScore DESC
Viral Spread Chain — Trace how content propagates through the network:
MATCH path = (origin:User)-[:CREATED]->(p:Post)<-[:SHARED]-(sharer:User)-[:FOLLOWS]->(reached:User)
WHERE origin.name = 'Alice'
RETURN origin.name, sharer.name, reached.name, p.title
Community Overlap — Find users active in multiple groups:
MATCH (u:User)-[:MEMBER_OF]->(g1:Group), (u)-[:MEMBER_OF]->(g2:Group)
WHERE g1 <> g2
RETURN u.name, g1.name, g2.name
3.6.3. Try It Yourself
git clone https://github.com/ArcadeData/arcadedb-usecases.git
cd arcadedb-usecases/social-network-analytics
docker compose up -d
./setup.sh
./queries/queries.sh
Full source: social-network-analytics on GitHub
3.6.4. Related Documentation
-
PageRank — Rank user influence in the network
-
Louvain Community Detection — Discover user communities
-
Betweenness Centrality — Identify bridge users connecting communities
-
Label Propagation — Fast community assignment
3.7. Supply Chain
Achieve multi-tier supply chain visibility in a single database by combining graph traversal for multi-tier supplier discovery, blast radius analysis, and end-to-end batch traceability via variable-length path queries, vector similarity for alternative supplier sourcing using capability embeddings with vectorNeighbors(), time-series analysis for delivery disruption detection via metric aggregation, and the PostgreSQL wire protocol for JavaScript connectivity.
3.7.1. Architecture Overview
Vertices |
|
Edges |
|
Documents |
|
Suppliers carry 4-dimensional capability vectors. Components flow through a multi-tier graph from raw materials through assembly to distribution. Delivery metrics track supplier performance over time.
3.7.2. Key Queries
Multi-Tier Supplier Discovery — Find all suppliers up to 3 tiers deep:
MATCH path = (p:Product)-[:CONTAINS*1..3]->(c:Component)<-[:SUPPLIES]-(s:Supplier)
WHERE p.name = 'SmartWatch Pro'
RETURN s.name, c.name, length(path) AS tier
Blast Radius Analysis — Assess impact of a supplier going offline:
MATCH (s:Supplier {name: 'ChipCorp'})-[:SUPPLIES]->(c:Component)<-[:CONTAINS]-(p:Product)
OPTIONAL MATCH (alt:Supplier)-[:SUPPLIES]->(c)
WHERE alt <> s
RETURN p.name, c.name, collect(alt.name) AS alternatives
Alternative Supplier Sourcing — Find suppliers with similar capabilities by embedding:
SELECT name, region, distance FROM (
SELECT expand(vectorNeighbors('Supplier[capability_vec]', [0.9, 0.8, 0.7, 0.6], 5))
)
3.7.3. Try It Yourself
git clone https://github.com/ArcadeData/arcadedb-usecases.git
cd arcadedb-usecases/supply-chain
docker compose up -d
./setup.sh
./queries/queries.sh
Full source: supply-chain on GitHub
3.8. Identity & Access Management
Manage identity access across complex permission hierarchies while detecting security anomalies and maintaining compliance. Graph traversal resolves permissions through nested group/role hierarchies via SQL MATCH queries, vector similarity detects behavioral anomalies using access pattern embeddings with vectorNeighbors(), time-series tracking captures access audit logs for SOX/GDPR compliance reporting, and the PostgreSQL wire protocol provides Python connectivity for integration.
3.8.1. Architecture Overview
Vertices |
|
Edges |
|
Documents |
|
Identities belong to nested groups that hold roles. Roles grant permissions on resources. Policies govern resources for compliance. Access logs track all actions for audit trails. Identities carry 8-dimensional access pattern vectors for behavioral analysis.
3.8.2. Key Queries
Permission Resolution — Resolve all permissions for a user through group/role hierarchy:
SELECT identity.email, role.name AS role, permission.action, resource.name AS resource
FROM MATCH {type: Identity, as: identity, where: (email = '[email protected]')}
-MEMBER_OF-> {type: Group, as: grp}
-HAS_ROLE-> {type: Role, as: role}
-GRANTS-> {type: Permission, as: permission}
-APPLIES_TO-> {type: Resource, as: resource}
Shadow Admin Detection — Find users with admin-equivalent access through indirect paths:
SELECT identity.email, role.name
FROM MATCH {type: Identity, as: identity}
-MEMBER_OF-> {type: Group, as: grp, while: (true)}
-HAS_ROLE-> {type: Role, as: role, where: (name LIKE '%admin%')}
Behavioral Anomaly Detection — Identify users with unusual access patterns:
SELECT email, distance FROM (
SELECT expand(vectorNeighbors('Identity[access_pattern_vec]', [0.1,0.9,0.1,0.8,0.2,0.7,0.3,0.6], 3))
) WHERE distance > 0.5
3.8.3. Try It Yourself
git clone https://github.com/ArcadeData/arcadedb-usecases.git
cd arcadedb-usecases/iam
docker compose up -d
./setup.sh
./queries/queries.sh
Full source: iam on GitHub
3.8.4. Related Documentation
-
BFS — Traverse permission hierarchies level by level
-
All Simple Paths — Enumerate all permission paths to a resource
-
Strongly Connected Components — Detect circular permission dependencies
3.9. Customer 360
Build a unified customer view in a single database by leveraging graph traversal for identity resolution, churn risk scoring, and cross-sell recommendations, the document model for fuzzy deduplication via cross-matching on customer attributes, vector similarity with LSM_VECTOR embeddings for customer preference and product matching, full-text search for support ticket content indexing, and time-series analysis for customer journey event chains and conversion path tracking.
3.9.1. Architecture Overview
Vertices |
|
Edges |
|
Customers connect to households, devices, addresses, and support tickets. Sessions contain events that trace customer journeys. Customers carry prefVector embeddings for personalization. Support tickets are full-text indexed.
3.9.2. Key Queries
Identity Resolution — Discover linked identities through shared sessions (3-hop transitive):
SELECT c1.name, c2.name, session.id AS sharedSession
FROM MATCH {type: Customer, as: c1}
-STARTED-> {type: Session, as: session}
<-STARTED- {type: Customer, as: c2}
WHERE c1 <> c2
Churn Risk Scoring — Calculate churn risk based on churned neighbors in the social network:
SELECT customer.name,
COUNT(CASE WHEN neighbor.status = 'churned' THEN 1 END) * 1.0 / COUNT(neighbor) AS churnRatio
FROM MATCH {type: Customer, as: customer}
-BELONGS_TO-> {type: Household}
<-BELONGS_TO- {type: Customer, as: neighbor}
GROUP BY customer.name ORDER BY churnRatio DESC
Customer Journey Analysis — Trace conversion paths from ad click to purchase:
MATCH (c:Customer)-[:STARTED]->(s:Session)-[:TRIGGERED]->(e1:Event)-[:TRIGGERED]->(e2:Event)-[:TRIGGERED]->(e3:Event)
WHERE e1.type = 'ad_click' AND e3.type = 'purchase'
RETURN c.name, e1.type, e2.type, e3.type, s.channel
3.9.3. Try It Yourself
git clone https://github.com/ArcadeData/arcadedb-usecases.git
cd arcadedb-usecases/customer-360
docker compose up -d
./setup.sh
./queries/queries.sh
Full source: customer-360 on GitHub
3.10. Getting Started with a Use Case
Clone the repository and pick any use case:
git clone https://github.com/ArcadeData/arcadedb-usecases.git
cd arcadedb-usecases/<use-case-name>
docker compose up -d
./setup.sh
./queries/queries.sh
3.12. Semantic Search (Planned)
| This use case is in development. Want to contribute? See the arcadedb-usecases repository. |
Standalone vector search for e-commerce product discovery and document retrieval. Demonstrates ArcadeDB’s vector capabilities without requiring graph traversal — pure semantic search with filtering, faceting, and hybrid keyword+vector ranking.
3.12.1. Planned Features
-
Vector Similarity — Product and document embeddings with LSM_VECTOR, HNSW, and DiskANN indexes
-
Full-Text Search — Hybrid keyword + semantic search with reciprocal rank fusion
-
Document Model — Faceted filtering on product attributes
-
Python — Primary implementation language targeting data science and AI workflows
3.13. Geospatial Analytics (Planned)
| This use case is in development. Want to contribute? See the arcadedb-usecases repository. |
Fleet tracking and location-based services combining geospatial indexes with graph topology and time-series movement data. Demonstrates spatial queries, route optimization, and real-time vehicle tracking.
3.13.1. Planned Features
-
Geospatial — Spatial indexes for point-in-polygon, nearest-neighbor, and distance queries
-
Graph Traversal — Road network and delivery route graph modeling
-
Time Series — Vehicle position tracking and movement analytics
-
Python and JavaScript — Dual implementation for backend analytics and frontend visualization
3.14. Content Management (Planned)
| This use case is in development. Want to contribute? See the arcadedb-usecases repository. |
A content management system combining document storage, full-text search, and graph-based taxonomies. Demonstrates how ArcadeDB replaces separate CMS databases with a single multi-model store for content, metadata, and navigation hierarchies.
3.14.1. Planned Features
-
Document Model — Content storage with flexible schemas for articles, pages, and media
-
Full-Text Search — Content indexing and search across all document types
-
Graph Traversal — Category trees, tag taxonomies, and content relationship graphs
-
JavaScript/TypeScript — Primary implementation for web application integration
3.15. Network Monitoring (Planned)
| This use case is in development. Want to contribute? See the arcadedb-usecases repository. |
Network topology modeling with time-series metrics and Grafana dashboards. Demonstrates how ArcadeDB unifies network graph analysis with performance monitoring, replacing separate topology databases and time-series stores.
3.15.1. Planned Features
-
Graph Traversal — Network topology modeling (routers, switches, links, VLANs) with path analysis
-
Time Series — Interface metrics (bandwidth, latency, packet loss) with PromQL queries
-
Vector Similarity — Anomaly detection on traffic pattern embeddings
-
Python — Primary implementation for network operations tooling
3.16. Data Lineage (Planned)
| This use case is in development. Want to contribute? See the arcadedb-usecases repository. |
Data pipeline lineage tracking with graph traversal for impact analysis, compliance reporting, and debugging data quality issues. Demonstrates modeling ETL/ELT pipelines as directed graphs with transformation metadata.
3.16.1. Planned Features
-
Graph Traversal — Pipeline DAG modeling (sources, transformations, destinations) with upstream/downstream analysis
-
Document Model — Schema snapshots and transformation metadata storage
-
Time Series — Pipeline execution history and data quality metrics
-
Python — Primary implementation for data engineering workflows
4. Explanations
4.1. Record
A record is the smallest unit you can load from and store in the database. Records come in three types:
-
Document
-
Vertex
-
Edge
Document
Documents are softly typed and are defined by schema types, but you can also use them in a schema-less mode too. Documents handle fields in a flexible manner. You can easily import and export them in JSON format. For example,
{
"name":"Jay",
"surname":"Miner",
"job":"Developer",
"creations":[{
"name":"Amiga 1000",
"company":"Commodore Inc."
},{
"name":"Amiga 500",
"company":"Commodore Inc."
}]
}
Vertex
In graph databases the vertices (also: vertices), or nodes represent the main entity that holds the information. It can be a patient, a company or a product. Vertices are themselves documents with some additional features. This means they can contain embedded records and arbitrary properties exactly like documents. Vertices are connected with other vertices through edges.
Edge
An edge, or arc, is the connection between two vertices. Edges can be unidirectional and bidirectional. One edge can only connect two vertices. Edges, like vertices, are also documents with additional features.
For more information on connecting vertices in general, see Relationships below.
Record ID
When ArcadeDB generates a record, it auto-assigns a unique identifier called a Record ID, RID for short.
The syntax for the RID is the pound symbol (#) with the bucket identifier, colon (:), and the position like so:
#<bucket-identifier>:<record-position>.
-
bucket-identifier: This number indicates the bucket id to which the record belongs. Positive numbers in the bucket identifier indicate persistent records. You can have up to 2,147,483,648 buckets in a database.
-
record-position: This number defines the absolute position of the record in the bucket.
A special case is #-1:-1 symbolizing the null RID.
The prefix character # is mandatory.
|
Each Record ID is immutable, universal, and only reused when configured to, see bucketReuseSpaceMode.
Additionally, records can be accessed directly through their RIDs at O(1) complexity which means the query speed is constant, unaffected by database size.
For this reason, you don’t need to create a field to serve as the primary key as you do in relational databases.
Record Retrieval Complexity
Retrieving a record by RID is of complexity O(1).
This is possible as the RID itself encodes both, the file a record is stored in, and the position inside it.
In an RID, i.e. #12:1000000, the bucket identifier (here #12) specifies the record’s associated file, while the record position (here 1000000) describes the position inside the file.
Bucket files are organized in pages (with default size 64KB) with a maximum number records per page (by default 2048).
To determine the byte position of a record in a bucket file, the rounded down quotient of record position and maximum records per page yields the page (here ⌊1000000 / 2048⌋), and the remainder gives the position on the page (here ⌊1000000 % 2048⌋).
In pseudo-code this computation is given by:
int pageId = floor(rid.getPosition() / maxRecordsInPage);
int positionInPage = floor(rid.getPosition() % maxRecordsInPage);
4.2. Types
The concept of type is taken from the Object Oriented Programming paradigm, sometimes known as "Class". In ArcadeDB, types define records. It is closest to the concept of a "Table" in relational databases and a "Class" in an object database.
Types can be schema-less, schema-full, or a mix. They can inherit from other types, creating a tree of types. Inheritance, in this context, means that a subtype extends a parent type, inheriting all of its properties and attributes. Practically, this is done by extending a type or setting a super-type.
Each type has its own buckets (data files). A type can support multiple buckets. When you execute a query against a type, it automatically fetches from all the buckets that are part of the type. When you create a new record, ArcadeDB selects the bucket to store it in using a configurable strategy.
As a default, ArcadeDB creates one bucket per type, but can be configured to for example to as many cores (processors) the host machine has, see typeDefaultBuckets.
In this, CRUD operations can go full speed in parallel with zero contention between CPUs and/or cores.
Having many buckets per type means having more files at file system level.
Check if your operating system has any limitation with the number of files supported and opened at the same time (ulimit for Unix-like systems).
You can query the defined types by executing the following SQL query: select from schema:types.
|
4.3. Buckets
Where types provide you with a logical framework for organizing data, buckets provide physical or in-memory space in which ArcadeDB actually stores the data. Each bucket is one file at file system level. It is comparable to the "collection" in Document databases, the "table" in Relational databases and the "cluster" in OrientDB. You can have up to 2,147,483,648 buckets in a database.
A bucket can only be part of one type. This means two types can not share the same bucket. Also, sub-types have their separate buckets from their super-types.
When you create a new type, the CREATE TYPE statement automatically creates the physical buckets (files) that serve as the default location in which to store data for that type.
ArcadeDB forms the bucket names by using the type name + underscore + a sequential number starting from 0. For example, the first bucket for the type Beer will be Beer_0 and the correspondent file in the file system will be Beer_0.31.65536.bucket.
By default ArcadeDB creates one bucket per type.
For massive inserts, performance can be improved by creating additional buckets and hence taking advantage of parallelism, i.e. by creating one bucket for each CPU core on the server.
Types vs. Buckets in Queries
The combination of types and buckets is very powerful and has a number of use cases. In most case, you can work with types and you will be fine. But if you are able to split your database into multiple buckets, you could address a specific bucket based instead of the entire type. By wisely using the buckets to divide your database in a way that help you with the retrieval means zero or less use of indexes. Indexes slow down insertion and take space on disk and RAM. In most cases you need indexes to speed up your queries, but in some use cases you could totally or partially avoid using indexes and still having good performance on queries.
One bucket per period
Consider an example where you create a type Invoice, with one bucket per year. Invoice_2015 and Invoice_2016.
You can query all invoices using the type as a target with the SELECT statement.
SELECT FROM Invoice
In addition to this, you can filter the result set by the year.
The type Invoice includes a year field, you can filter it through the WHERE clause.
SELECT FROM Invoice WHERE year = 2016
You can also query specific records from a single bucket.
By splitting the type Invoice across multiple buckets, (that is, one per year in our example), you can optimize the query by narrowing the potential result set.
SELECT FROM BUCKET:Invoice_2016
By using the explicit bucket instead of the logical type, this query runs significantly faster, because ArcadeDB can narrow the search to the targeted bucket.
No index is needed on the year, because all the invoices for year 2016 will be stored in the bucket Invoice_2016 by the application.
One bucket per location
Like with the example above, we could split our records by location creating one bucket per location. Example:
CREATE BUCKET Customer_Europe
CREATE BUCKET Customer_Americas
CREATE BUCKET Customer_Asia
CREATE BUCKET Customer_Other
CREATE VERTEX TYPE Customer BUCKET Customer_Europe,Customer_Americas,Customer_Asia,Customer_Other
Here we are using the graph model by creating a vertex type, but it’s the same with documents.
Use CREATE DOCUMENT TYPE instead.
Now in your application, store the vertices or documents in the right bucket, based on the location of such customer. You can use any API and set the bucket. If you’re using SQL, this is the way you can insert a new customer into a specific bucket.
INSERT INTO BUCKET:Customer_Europe CONTENT { firstName: 'Enzo', lastName: 'Ferrari' }
Since a bucket can only be part of one type, when you use the bucket notation with SQL, the type is inferred from the bucket, "Customer" in this case.
When you’re looking for customers based in Europe, you could execute this query:
SELECT FROM BUCKET:Customer_Europe
You can go even more specific by creating a bucket per country, not just for continent, and query from that bucket. Example:
CREATE BUCKET 'Customer_Europe_Italy'
CREATE BUCKET 'Customer_Europe_Spain'
Now get all the customers that live in Italy:
SELECT FROM BUCKET:Customer_Europe_Italy
You can also specify a list of buckets in your query. This is the query to retrieve both Italian and Spanish customers.
SELECT FROM BUCKET:[Customer_Europe_Italy,Customer_Europe_Spain]
4.4. Relationships
ArcadeDB supports three kinds of relationships: connections, referenced and embedded. It can manage relationships in a schema-full or schema-less scenario.
Graph Connections
As a graph database, spanning edges between vertices is one way to express a connections between records. This is the graph model’s natural way of relationsships and traversable by the SQL, Gremlin, and Cypher query languages. Internally, ArcadeDB deposes a direct (referenced) relationship for edge-wise connected vertices to ensure fast graph traversals.
Example
In ArcadeDB’s SQL, edges are created via the CREATE EDGE command.
Referenced Relationships
In Relational databases, tables are linked through JOIN commands, which can prove costly on computing resources.
ArcadeDB manages relationships natively without computing a JOIN but storing a direct LINK to the target object of the relationship.
This boosts the load speed for the entire graph of connected objects, such as in graph and object database systems.
Example
Note, that referenced relationships differ from edges: references are properties connecting any record while edges are types connecting vertices, and particularly, graph traversal is only applicable to edges.
Embedded Relationships
When using Embedded relationships, ArcadeDB stores the relationship within the record that embeds it. These relationships are stronger than Reference relationships. You can represent it as a UML Composition relationship.
Embedded records do not have their own RID, so it can’t be referenced through other records. It is only accessible directly through the container record. Furthermore, an embedded record is stored inside the embedding record, and not in an embedded record type’s bucket. Hence, in the event that you delete the container record, the embedded record is also deleted. For example,
Here, record A contains the entirety of record B in the property address.
You can reach record B only by traversing the container record.
For example,
SELECT FROM Account WHERE address.city = 'Rome'
1:1 and n:1 Embedded Relationships
ArcadeDB expresses relationships of these kinds using the EMBEDDED type.
1:n and n:n Embedded Relationships
ArcadeDB expresses relationships of these kinds using a list or a map of links, such as:
-
LISTAn ordered list of records. -
MAPAn ordered map of records as the value and a string as the key, it doesn’t accept duplicate keys.
Inverse Relationships
In ArcadeDB, all edges in the graph model are bidirectional. This differs from the document model, where relationships are always unidirectional, requiring the developer to maintain data integrity. In addition, ArcadeDB automatically maintains the consistency of all bidirectional relationships.
Edge Constraints
ArcadeDB supports edge constraints, which means limiting the admissible vertex types that can be connected by an edge type.
To this end the implicit metadata properties @in and @out need to be made explicit by creating them.
For example, for an edge type HasParts that is supposed to connect only from vertices of type Product to vertices of type Component, this can be schemed by:
CREATE EDGE TYPE HasParts;
CREATE PROPERTY HasParts.`@out` link OF Product;
CREATE PROPERTY HasParts.`@in` link OF Component;
Relationship Traversal Complexity
As a native graph database, ArcadeDB supports index free adjacency. This means constant graph traversal complexity of O(1), independent of the graph expanse (database size).
To traverse a graph structure, one needs to follow references stored by the current record. These references are always stored as RIDs, and are not only pointers to incoming and outgoing edges, but also to connected vertices. Internally, references are managed by a stack (also known as LIFO), which allows to get the latest insertion first. As not only edges, but also connected vertices are stored, neighboring nodes can be reached directly, particularly without going via the connecting edge. This is useful if edges are used purely to connect vertices and do not carry i.e. properties themselves.
4.5. Database
Each server or Java VM can handle multiple database instances, but the database name must be unique.
Database URL
ArcadeDB uses its own URL format, of engine and database name as <engine>:<db-name>.
The embedded engine is the default and can be omitted.
To open a database on the local file system you can use directly the path as URL.
Database Usage
You must always close the database once you finish working on it.
ArcadeDB automatically closes all opened databases, when the process dies gracefully (not by killing it by force).
This is assured if the operating system allows a graceful shutdown.
For example, on Unix/Linux systems using SIGTERM, or in Docker exit code 143 instead of SIGKILL, or in Docker exit code 137.
|
4.6. Transactions
A transaction comprises a unit of work performed within a database management system (or similar system) against a database, and treated in a coherent and reliable way independent of other transactions. Transactions in a database environment have two main purposes:
-
to provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure, when execution stops (completely or partially) and many operations upon a database remain uncompleted, with unclear status
-
to provide isolation between programs accessing a database concurrently. If this isolation is not provided, the program’s outcome are possibly erroneous.
A database transaction, by definition, must be atomic, consistent, isolated and durable. Database practitioners often refer to these properties of database transactions using the acronym ACID). - Wikipedia
ArcadeDB is an ACID compliant DBMS.
| ArcadeDB keeps the transaction in the host’s RAM, so the transaction size is affected by the available RAM (Heap memory) on JVM. For transactions involving many records, consider to split it in multiple transactions. |
ACID Properties
Atomicity
"Atomicity requires that each transaction is 'all or nothing': if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes. To the outside world, a committed transaction appears (by its effects on the database) to be indivisible ("atomic"), and an aborted transaction does not happen." - Wikipedia
Consistency
"The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including but not limited to constraints, cascades, triggers, and any combination thereof. This does not guarantee correctness of the transaction in all ways the application programmer might have wanted (that is the responsibility of application-level code) but merely that any programming errors do not violate any defined rules." - Wikipedia
ArcadeDB uses the MVCC to assure consistency by versioning the page where the record are stored.
Look at this example:
| Sequence | Client/Thread 1 | Client/Thread 2 | Version of page containing record X |
|---|---|---|---|
1 |
Begin of Transaction |
||
2 |
read(x) |
10 |
|
3 |
Begin of Transaction |
||
4 |
read(x) |
10 |
|
5 |
write(x) |
10 |
|
6 |
commit |
10 → 11 |
|
7 |
write(x) |
10 |
|
8 |
commit |
10 → 11 = Error, in database x already is at 11 |
Isolation
"The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e. one after the other. Providing isolation is the main goal of concurrency control. Depending on concurrency control method, the effects of an incomplete transaction might not even be visible to another transaction." - Wikipedia
The SQL standard defines the following phenomena which are prohibited at various levels are:
-
Dirty Read: a transaction reads data written by a concurrent uncommitted transaction. This is never possible with ArcadeDB.
-
Non Repeatable Read: a transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read).
-
Phantom Read: a transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction. This happens also when records are deleted or inserted during the transaction and they could become visible during the transaction.
The SQL standard transaction isolation levels are described in the table below:
| Isolation Level | Dirty Read | Non repeatable Read | Phantom Read |
|---|---|---|---|
|
Not possible |
Possible |
Possible |
|
Not possible |
Not possible |
Possible |
The SQL SERIALIZABLE level is not supported by ArcadeDB.
Using remote access all the commands are executed on the server, so out of transaction scope.
Look below for more information.
Look at these examples:
| Sequence | Client/Thread 1 | Client/Thread 2 |
|---|---|---|
1 |
Begin of Transaction |
|
2 |
read(x) |
|
3 |
Begin of Transaction |
|
4 |
read(x) |
|
5 |
write(x) |
|
6 |
commit |
|
7 |
read(x) |
|
8 |
commit |
At operation 7 the client 1 continues to read the same version of x read in operation 2.
| Sequence | Client/Thread 1 | Client/Thread 2 |
|---|---|---|
1 |
Begin of Transaction |
|
2 |
read(x) |
|
3 |
Begin of Transaction |
|
4 |
read(y) |
|
5 |
write(y) |
|
6 |
commit |
|
7 |
read(y) |
|
8 |
commit |
At operation 7 the client 1 reads the version of y which was written at operation 6 by client 2. This is because it never reads y before.
Durability
"Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently (even if the database crashes immediately thereafter). To defend against power loss, transactions (or their effects) must be recorded in a non-volatile memory." - Wikipedia
Fail-over
An ArcadeDB instance can fail for several reasons:
-
Hardware problems, such as loss of power or disk error
-
Software problems, such as a operating system crash
-
Application problems, such as a bug that crashes your application that is connected to the ArcadeDB engine.
You can use the ArcadeDB engine directly in the same process of your application. This gives superior performance due to the lack of inter-process communication. In this case, should your application crash (for any reason), the ArcadeDB engine also crashes.
If you’re connected to an ArcadeDB server remotely, and if your application crashes but the engine continues to work, any pending transaction owned by the client will be rolled back.
Auto-recovery
At start-up the ArcadeDB engine checks to if it is restarting from a crash. In this case, the auto-recovery phase starts which rolls back all pending transactions.
ArcadeDB has different levels of durability based on storage type, configuration and settings.
WAL Flush and Durability
ArcadeDB uses a Write-Ahead Log (WAL) to guarantee transaction durability.
The arcadedb.txWalFlush setting controls whether the WAL is flushed (fsynced) to disk at commit time:
| Value | Behavior | Durability guarantee |
|---|---|---|
|
No flush. WAL is written to the OS page cache but not fsynced. |
Safe against process crashes (OS page cache survives). Not safe against power loss or OS crash - committed transactions may be lost. |
|
Flush without metadata ( |
Safe against power loss. Recommended for production. |
|
Full flush ( |
Maximum durability guarantee. Slightly slower than |
In production server mode (arcadedb.server.mode=production), ArcadeDB automatically sets txWalFlush=1 if you have not explicitly configured it.
This ensures that production deployments are durable by default.
If you explicitly set txWalFlush=0 in production mode, a warning is logged at startup.
|
The WAL flush setting can also be changed per-database or per-transaction via the Java API:
// Per-database
database.setWALFlush(WALFile.FlushType.YES_NOMETADATA);
// Per-transaction
database.getTransaction().setWALFlush(WALFile.FlushType.YES_FULL);
// Per async executor
database.async().setTransactionSync(WALFile.FlushType.YES_NOMETADATA);
If your storage hardware has battery-backed write cache (BBU/BBWC) or power-loss protection (common in enterprise SSDs and cloud block storage like AWS EBS), txWalFlush=0 is safe even in production because the hardware guarantees that buffered writes reach persistent storage on power loss.
|
Optimistic Transaction
This mode uses the well known Multi Version Control System MVCC by allowing multiple reads and writes on the same records.
The integrity check is made on commit.
If the record has been saved by another transaction in the interim, then an ConcurrentModificationException will be thrown.
The application can choose either to repeat the transaction or abort it.
| ArcadeDB keeps the whole transaction in the host’s RAM, so the transaction size is affected by the available RAM (Heap) memory on JVM. For transactions involving many records, consider to split it in multiple transactions. |
Nested transactions and propagation
ArcadeDB does support nested transaction.
If a begin() is called after a transaction is already begun, then the new transaction is the current one until commit or rollback.
When this nested transaction is completed, the previous transaction becomes the current transaction again.
4.7. Inheritance
Unlike many object-relational mapping tools, ArcadeDB does not split documents between different types. Each document resides in one or a number of buckets associated with its specific type. When you execute a query against a type that has subtypes, ArcadeDB searches the buckets of the target type and all subtypes.
Declaring Inheritance in Schema
In developing your application, bear in mind that ArcadeDB needs to know the type inheritance relationship.
For example,
DocumentType account = database.getSchema().createDocumentType("Account");
DocumentType company = database.getSchema().createDocumentType("Company").addSuperType(account);
Using Polymorphic Queries
By default, ArcadeDB treats all queries as polymorphic. Using the example above, you can run the following query from the console:
SELECT FROM Account WHERE name.toUpperCase() = 'GOOGLE'
This query returns all instances of the types Account and Company that have a property name that matches Google.
How Inheritance Works
Consider an example, where you have three types, listed here with the bucket identifier in the parentheses.
By default, ArcadeDB creates a separate bucket for each type.
It indicates this bucket by the defaultBucketId property in a type and indicates the bucket used by default when not specified.
However, a type has a property bucketIds, (as int[]), that contains all the buckets able to contain the records of that type. The properties bucketIds and defaultBucketId are the same by default.
When you execute a query against a type, ArcadeDB limits the result-sets to only the records of the buckets contained in the bucketIds property.
For example,
SELECT FROM Account WHERE name.toUpperCase() = 'GOOGLE'
This query returns all the records with the name property set to GOOGLE from all three types, given that the base type Account was specified.
For the type Account, ArcadeDB searches inside the buckets 10, 13 and 27, following the inheritance specified in the schema.
4.8. Schema
ArcadeDB supports schema-less, schema-full and hybrid operation. This means for all types that have a declared schema it is applied. Minimally, a type (document, vertex or edge) needs to be declared in a database to be able to insert to, i.e. CREATE TYPE (SQL) and ALTER TYPE (SQL). Beyond the type, declaration properties with or without constraints, can be declared, i.e. CREATE PROPERTY (SQL) and ALTER PROPERTY (SQL). Inserting into a declared type, all datasets are accepted (even with additional undeclared properties) as long as no property constraints are violated.
4.9. Indexes
ArcadeDB supports multiple index algorithms, each optimized for different access patterns:
-
LSM Tree (default) — Optimized for range scans, ordered iteration, and write-heavy workloads.
-
Hash Index — O(1) equality lookups using extendable hashing. Best for primary key access, JOINs, and edge traversal where ordering is not needed.
4.9.1. LSM Tree algorithm
LSM tree is a type of data structure that is used to store and retrieve data efficiently. It works by organizing data in a tree-like structure, where each node in the tree represents a certain range of data.
Here’s how it works:
-
When you want to store a piece of data in the LSM tree, it first goes into a special part of the tree called a "write buffer." The write buffer is like a temporary storage area where new data is kept until it’s ready to be added to the tree.
-
When the write buffer gets full, the LSM tree will "flush" the data from the write buffer into the main part of the tree. This is done by creating a new node in the tree and adding the data from the write buffer to it.
-
As more and more data is added to the tree, it will eventually become too large to be stored in memory (this is known as "overflowing"). When this happens, the LSM tree will start to "compact" the data by moving some of it to disk storage. This allows the tree to continue growing without running out of memory.
-
When you want to retrieve a piece of data from the LSM tree, the algorithm will search for it in the write buffer, the main part of the tree, and any data that has been compacted to disk storage. If the data is found, it will be returned to you
4.9.2. LSM Tree vs B+Tree
B+Tree is the most common algorithm used by relational DBMSs. What are the differences?
-
LSM tree and B+ tree are both data structures that are commonly used to store and retrieve data efficiently. Here are some of the main advantages of LSM tree over B+ tree:
-
LSM tree is more efficient for writes: LSM tree uses a write buffer to temporarily store new data, which allows it to batch writes and reduce the number of disk accesses required. This can make it faster than B+ tree for inserting large amounts of data.
-
LSM tree is more efficient for compaction: Because LSM tree stores data in a sorted fashion, it can compact data more efficiently by simply merging sorted data sets. B+ tree, on the other hand, requires more complex rebalancing operations when compacting data.
-
LSM tree is more space-efficient: LSM tree stores data in a compact, sorted format, which can make it more space-efficient than B+ tree. This can be especially useful when storing large amounts of data on disk.
-
However, there are also some potential disadvantages of LSM tree compared to B+ tree. For example, B+ tree may be faster for queries that require range scans or random access, and it may be easier to implement in some cases.
If you’re interested to ArcadeDB’s LSM-Tree index implementation detail, look at LSM-Tree
4.9.3. Hash Index algorithm
ArcadeDB’s Hash Index uses extendable hashing, a disk-oriented algorithm that provides O(1) equality lookups with typically 1-2 page reads.
How it works
-
Each key is hashed to produce a binary hash code. The index maintains a global depth — the number of leading bits used from each hash.
-
A directory maps each possible bit prefix (2globalDepth entries) to a bucket page. Multiple directory entries can point to the same bucket if the bucket’s local depth is less than the global depth.
-
To look up a key, the index hashes the key, reads the directory entry for the corresponding prefix, and reads the target bucket page. The key is found via binary search within the bucket.
-
When a bucket overflows, it splits: the bucket’s local depth increases by one, and entries are redistributed based on the additional bit. If the local depth exceeds the global depth, the directory doubles in size.
When to use Hash Index vs LSM Tree
| Use Case | Hash Index | LSM Tree |
|---|---|---|
Point lookup ( |
Best — O(1), 1-2 page reads |
O(log N), multiple page reads |
JOINs and edge traversal |
Best — constant-time resolution |
Good |
Range scans ( |
Not supported |
Best — ordered iteration |
|
Not supported |
Best — natural ordering |
Bulk insertion throughput |
Consistent — no compaction |
May degrade at scale due to compaction |
Key properties
-
No compaction needed: Unlike LSM Tree, there are no background merge operations and no write amplification.
-
Local splits: When a bucket is full, only that bucket splits — no global reorganization.
-
Supports unique and non-unique: Both
UNIQUE_HASHandNOTUNIQUE_HASHmodes are available. -
Null strategy: Supports the same
NULL_STRATEGYoptions as LSM Tree (SKIP,ERROR,INDEX).
4.9.4. Case-Insensitive Indexes (COLLATE CI)
By default, ArcadeDB indexes are case-sensitive: "Hello" and "hello" are treated as different keys.
You can create case-insensitive indexes by using the COLLATE CI clause in the CREATE INDEX statement.
When COLLATE CI is specified for a property, the index stores values internally in lowercase (using the root locale).
This means:
-
Equality lookups are case-insensitive:
WHERE Name = 'Hello World'matches"Hello World","HELLO WORLD","hello world", etc. -
Range queries are case-insensitive:
WHERE Name > 'a'uses the lowercased form for comparison. -
Unique constraints are case-insensitive: A
UNIQUEindex withCOLLATE CIprevents inserting both"Admin"and"admin". -
Original values are preserved: The document still stores the original cased value; only the index key is lowercased.
Syntax
CREATE INDEX ON <type> (<property> COLLATE CI) <index-type>
In composite indexes, COLLATE CI can be specified independently per property:
CREATE INDEX ON Product (Name COLLATE CI, Code) UNIQUE
Here Name is case-insensitive while Code remains case-sensitive.
Query optimizer support
The query optimizer automatically recognizes queries that use .toLowerCase() on a property with a COLLATE CI index.
For example:
SELECT FROM Product WHERE Name.toLowerCase() = 'hello world'
This query will use the case-insensitive index on Name instead of performing a full scan, avoiding the overhead of calling toLowerCase() on every record.
When to use
-
User-facing lookups: Usernames, email addresses, product names — where the end user expects case-insensitive matching.
-
Deduplication: When you want a unique constraint that ignores case differences.
-
Replacing
toLowerCase()patterns: If your queries already use.toLowerCase()for case-insensitive matching, aCOLLATE CIindex is more efficient because the conversion happens once at insert time, not on every query.
4.9.5. Index Property Types
ArcadeDB indexes can index property fields in various ways:
-
All properties (scalar or collections) can be indexed in a unique or non-unique index (as a whole).
-
An index can be for a single property, or multiple properties in a compound index.
-
String properties can be index tokenized in a full-text index.
-
Object properties can be indexed by keys or by values.
-
List properties can be indexed by values.
-
Vectors of floats can be indexed using specialized vector indexes (HNSW or LSMVectorIndex).
-
WKT geometry strings can be indexed using a geospatial index.
JVectorLSMVectorIndex
LSMVectorIndex is a vector index built on ArcadeDB’s LSM Tree architecture, providing efficient persistent storage and retrieval of vector embeddings. Key features include:
-
Persistent Storage: Vector indexes are stored on disk with automatic page management and compaction
-
Configurable Similarity Functions: Supports COSINE (default), DOT_PRODUCT, and EUCLIDEAN distance metrics
-
Configurable ID Property: The property used to identify vertices can be customized (defaults to "id")
-
Multi-Language Query Support: Query vectors using
vector.neighbors()from SQL orCALL db.index.vector.queryNodes()from Cypher -
Automatic Compaction: Efficiently reclaims disk space through automatic compaction of immutable pages
-
High Performance: Leverages LSM Tree benefits for write efficiency and space optimization at scale
4.10. Graph Database
ArcadeDB is a native graph database; and in this section we explain what this means and how this relates to applications.
Graph Components
Essentially a graph is tuple, or pair of sets, of vertices (aka nodes) and edges (aka arcs), whereas the set of vertices contains an (indexed) set of objects, and the set of edges contains (at least) pairs specifying a respective edge’s endpoints.
A particular type of graph is the directed graph, which is characterized by oriented edges, meaning each edge’s pair is ordered. A simple example of a directed graph is given by:
with:
-
Vertices: V = {v_1, v_2}
-
Edges: E = {e_(1,2)}
Graph Database Types
There are two prevalent data models for graph databases (both usually directed):
-
Triple store (RDF graph): data is represented as subject–predicate–object triples; all facts are uniform triples and semantics are standardized (RDF, RDFS/OWL, SPARQL).
-
Property Graph: vertices and edges carry labels (types) and arbitrary key/value properties directly attached to them.
ArcadeDB is a property graph database: vertices and edges have labels and can store arbitrary key/value properties, supporting compact storage, fast traversals, and flexible schema evolution within its multi-model, document‑oriented engine.
Practically, a vertex or edge consists of an identifier (RID), a label (type), and properties (document), plus ordered endpoint references. The latter are here vertex properties of incoming and outgoing edges, instead of edge properties of ordered vertex pairs, for technical reasons.
Why (Property) Graph Databases?
-
The modeled domain is already a network.
-
Fast traversal of relations instead of costly joins in relational databases.
-
Naturally annotated edges instead of inconvenient reification in RDF graphs.
4.10.1. See Also
-
Recommendation Engine — Use case combining graph traversals with collaborative filtering
-
Fraud Detection — Use case leveraging graph patterns to detect fraudulent activity
-
Gremlin API — Tinkerpop Gremlin query language for graph traversals
-
Cypher — OpenCypher query language for pattern matching on graphs
4.11. Time Series
ArcadeDB includes a native Time Series engine designed for high-throughput ingestion and fast analytical queries over timestamped data. Unlike bolt-on solutions, the Time Series model is integrated directly into the multi-model core — the same database that stores graphs, documents, and key/value pairs can store and query billions of time-stamped samples with specialized columnar compression, SIMD-vectorized aggregation, and automatic lifecycle management.
Key capabilities:
-
Columnar storage with Gorilla (float), Delta-of-Delta (timestamp), Simple-8b (integer), and Dictionary (tag) compression — 0.4 to 1.4 bytes per sample
-
Shard-per-core parallelism with lock-free writes
-
Block-level aggregation statistics for zero-decompression fast-path queries
-
InfluxDB Line Protocol ingestion for compatibility with Telegraf, Grafana Agent, and hundreds of collection agents
-
Prometheus remote_write / remote_read protocol for drop-in Prometheus backend usage
-
PromQL query language — native parser and evaluator with HTTP-compatible API endpoints
-
SQL analytical functions —
ts.timeBucket,ts.rate,ts.percentile,ts.interpolate, window functions, and more -
Continuous aggregates with watermark-based incremental refresh
-
Retention policies and downsampling tiers for automatic data lifecycle
-
Grafana integration via DataFrame-compatible endpoints (works with the Infinity datasource plugin)
-
Studio TimeSeries Explorer with query, schema inspection, ingestion docs, and PromQL tabs
4.11.1. Creating a TimeSeries Type
Use CREATE TIMESERIES TYPE to define a new time series type.
Every type requires a TIMESTAMP column, zero or more TAGS (low-cardinality indexed dimensions), and one or more FIELDS (high-cardinality measurement values).
CREATE TIMESERIES TYPE SensorReading
TIMESTAMP ts PRECISION NANOSECOND
TAGS (sensor_id STRING, location STRING)
FIELDS (
temperature DOUBLE,
humidity DOUBLE,
pressure DOUBLE
)
SHARDS 8
RETENTION 90 DAYS
COMPACTION_INTERVAL 1 HOURS
BLOCK_SIZE 65536
Minimal syntax (defaults for everything optional):
CREATE TIMESERIES TYPE SensorReading
TIMESTAMP ts
TAGS (sensor_id STRING)
FIELDS (temperature DOUBLE)
| Option | Default | Description |
|---|---|---|
|
(required) |
Name of the timestamp column |
|
|
Timestamp resolution: |
|
(none) |
Comma-separated |
|
(required) |
Comma-separated |
|
CPU count |
Number of shards for parallel writes |
|
(none) |
Automatic deletion of data older than the specified duration (e.g., |
|
(none) |
Splits sealed blocks at time-bucket boundaries for fast-path aggregation |
|
|
Samples per sealed block |
|
(none) |
Silently skip creation if the type already exists |
4.11.2. Altering a TimeSeries Type
Add downsampling policies to automatically reduce resolution of old data:
ALTER TIMESERIES TYPE SensorReading
ADD DOWNSAMPLING POLICY
AFTER 7 DAYS GRANULARITY 1 MINUTES
AFTER 30 DAYS GRANULARITY 1 HOURS
Remove all downsampling policies:
ALTER TIMESERIES TYPE SensorReading DROP DOWNSAMPLING POLICY
4.11.3. Dropping a TimeSeries Type
DROP TIMESERIES TYPE SensorReading
DROP TIMESERIES TYPE IF EXISTS SensorReading
4.11.4. Ingesting Data
There are four ways to ingest data into a TimeSeries type, listed from fastest to most convenient.
InfluxDB Line Protocol (Recommended for High Throughput)
The fastest remote ingestion path. ArcadeDB exposes an InfluxDB Line Protocol-compatible HTTP endpoint that skips SQL parsing entirely.
POST /api/v1/ts/{database}/write?precision=ns
Content-Type: text/plain
SensorReading,sensor_id=sensor-A,location=building-1 temperature=22.5,humidity=65.0,pressure=1013.25 1708430400000000000
SensorReading,sensor_id=sensor-B,location=building-2 temperature=19.1,humidity=70.0 1708430400000000000
Line Protocol format: <measurement>[,<tag>=<value>…] <field>=<value>[,…] [<timestamp>]
Precision parameter: ns (nanoseconds, default), us (microseconds), ms (milliseconds), s (seconds).
Example with curl:
curl -X POST "http://localhost:2480/api/v1/ts/mydb/write?precision=ns" \
-u root:password \
-H "Content-Type: text/plain" \
--data-binary 'SensorReading,sensor_id=sensor-A temperature=22.5 1708430400000000000
SensorReading,sensor_id=sensor-A temperature=22.6 1708430401000000000'
Example with Python:
import requests
lines = [
"SensorReading,sensor_id=sensor-A temperature=22.5 1708430400000000000",
"SensorReading,sensor_id=sensor-A temperature=22.6 1708430401000000000",
]
requests.post(
"http://localhost:2480/api/v1/ts/mydb/write?precision=ns",
auth=("root", "password"),
headers={"Content-Type": "text/plain"},
data="\n".join(lines),
)
If the type does not exist and auto-creation is enabled (arcadedb.tsAutoCreateType=true), the schema is inferred from the first line: measurement name becomes the type, tags become TAG columns, fields become FIELD columns with inferred types.
Prometheus Remote Write
ArcadeDB acts as a drop-in Prometheus remote storage backend. Configure Prometheus to write to ArcadeDB:
# prometheus.yml
remote_write:
- url: "http://localhost:2480/ts/mydb/prom/write"
basic_auth:
username: root
password: password
The endpoint accepts the standard Prometheus remote_write Protobuf payload (snappy-compressed).
Each time series is mapped to an ArcadeDB TimeSeries type named after the name label.
Types are auto-created if they do not exist.
SQL INSERT
Standard ArcadeDB SQL syntax works for TimeSeries types:
-- Single row
INSERT INTO SensorReading
SET ts = '2026-02-20T10:00:00.000Z',
sensor_id = 'sensor-A',
location = 'building-1',
temperature = 22.5,
humidity = 65.0
-- Batch insert
INSERT INTO SensorReading
(ts, sensor_id, location, temperature, humidity)
VALUES
('2026-02-20T10:00:00Z', 'sensor-A', 'building-1', 22.5, 65.0),
('2026-02-20T10:00:01Z', 'sensor-A', 'building-1', 22.6, 64.8),
('2026-02-20T10:00:02Z', 'sensor-B', 'building-2', 19.1, 70.0)
-- CONTENT syntax
INSERT INTO SensorReading
CONTENT { "ts": "2026-02-20T10:00:00Z", "sensor_id": "sensor-A", "temperature": 22.5 }
Java Embedded API
The fastest path — bypasses all protocol and SQL overhead:
TimeSeriesEngine engine = database.getSchema()
.getTimeSeriesType("SensorReading").getEngine();
long[] timestamps = { 1708430400000000000L, 1708430401000000000L };
String[] sensorIds = { "sensor-A", "sensor-A" };
double[] temperatures = { 22.5, 22.6 };
database.transaction(() -> {
engine.appendSamples(timestamps,
new Object[] { sensorIds, temperatures });
});
Ingestion Method Comparison
| Method | Throughput | Overhead | Best For |
|---|---|---|---|
Java Embedded API |
~0.5-1 us/sample |
None (direct) |
Embedded applications |
InfluxDB Line Protocol |
~1-5 us/sample |
Text parsing |
Remote ingestion, Telegraf |
Prometheus Remote Write |
~2-10 us/sample |
Protobuf + Snappy |
Prometheus ecosystems |
SQL INSERT |
~50-100 us/sample |
SQL parsing + planning |
Ad-hoc inserts, small batches |
4.11.5. Querying Time Series Data
SQL Queries
Time series types support standard SQL SELECT with WHERE, GROUP BY, and ORDER BY.
Time range conditions (BETWEEN, >, >=, <, <=, =) on the timestamp column are pushed down to the storage engine for efficient range scans.
-- Basic range query
SELECT ts, sensor_id, temperature, humidity
FROM SensorReading
WHERE ts BETWEEN '2026-02-19' AND '2026-02-20'
AND sensor_id = 'sensor-A'
ORDER BY ts
-- Aggregation with time bucketing
SELECT ts.timeBucket('1h', ts) AS hour,
sensor_id,
avg(temperature) AS avg_temp,
max(temperature) AS max_temp,
min(temperature) AS min_temp,
count(*) AS sample_count
FROM SensorReading
WHERE ts BETWEEN '2026-02-19' AND '2026-02-20'
GROUP BY hour, sensor_id
ORDER BY hour
TimeSeries SQL Functions
ArcadeDB provides a comprehensive set of ts.* SQL functions for time series analytics.
| Function | Description |
|---|---|
|
Truncates a timestamp to the nearest interval boundary for |
|
Returns the value corresponding to the earliest timestamp in the group. |
|
Returns the value corresponding to the latest timestamp in the group. |
|
Per-second rate of change. Optional 3rd parameter ( |
|
Difference between the last and first values in the group. |
|
Moving average with a configurable window size. |
|
Gap filling. Methods: |
|
Pearson correlation coefficient between two series. |
|
Approximate percentile calculation (0.0-1.0). E.g., |
|
Window function: returns the value from a previous row. |
|
Window function: returns the value from a subsequent row. |
|
Window function: sequential 1-based row numbering. |
|
Window function: rank with ties, gaps after ties. |
Examples:
-- Rate of change with counter reset detection
SELECT ts.timeBucket('5m', ts) AS window,
ts.rate(request_count, ts, true) AS requests_per_sec
FROM HttpMetrics
WHERE ts > '2026-02-20T10:00:00Z'
GROUP BY window
-- Percentile calculation
SELECT ts.timeBucket('1h', ts) AS hour,
ts.percentile(latency_ms, 0.99) AS p99,
ts.percentile(latency_ms, 0.50) AS median
FROM ServiceMetrics
GROUP BY hour
-- Gap filling with linear interpolation
SELECT ts.timeBucket('1m', ts) AS minute,
ts.interpolate(temperature, 'linear', ts) AS temp
FROM SensorReading
WHERE ts BETWEEN '2026-02-20T10:00:00Z' AND '2026-02-20T11:00:00Z'
GROUP BY minute
-- Correlation between two fields
SELECT ts.correlate(temperature, humidity) AS correlation
FROM SensorReading
WHERE ts BETWEEN '2026-02-19' AND '2026-02-20'
Dedicated JSON Query Endpoint
A simplified REST endpoint is available for programmatic access and Grafana integration:
POST /api/v1/ts/{database}/query
Content-Type: application/json
{
"type": "SensorReading",
"from": "2026-02-19T00:00:00Z",
"to": "2026-02-20T00:00:00Z",
"columns": ["temperature", "humidity"],
"tags": { "sensor_id": "sensor-A" },
"aggregation": "AVG",
"bucketInterval": "1h"
}
For raw (non-aggregated) queries, omit the aggregation and bucketInterval fields.
To retrieve the most recent data point:
GET /api/v1/ts/{database}/latest?type=SensorReading&tag=sensor_id:sensor-A
4.11.6. PromQL Query Language
ArcadeDB includes a native PromQL parser and evaluator, providing Prometheus-compatible query capabilities without requiring an external Prometheus server.
PromQL via HTTP API
The PromQL endpoints follow the Prometheus HTTP API format and return standard {status: "success", data: {…}} JSON responses.
Instant query:
GET /ts/{database}/prom/api/v1/query?query=avg(cpu_usage{host="srv1"})&time=1700000000
The time parameter is Unix seconds (float). Defaults to current time if omitted.
Range query:
GET /ts/{database}/prom/api/v1/query_range?query=rate(http_requests_total[5m])&start=1700000000&end=1700003600&step=60
All timestamps in Unix seconds. The step parameter accepts a duration string (60s, 1m) or seconds as a float.
Label discovery:
GET /ts/{database}/prom/api/v1/labels
GET /ts/{database}/prom/api/v1/label/{name}/values
GET /ts/{database}/prom/api/v1/series?match[]=cpu_usage{host=~"srv.*"}
Supported PromQL Features
| Category | Supported |
|---|---|
Vector selectors |
|
Range selectors |
|
Aggregations |
|
Rate functions |
|
Over-time functions |
|
Math functions |
|
Label functions |
|
Other |
|
Operators |
|
PromQL via SQL
The promql() SQL function calls the PromQL evaluator from within SQL queries:
-- Instant query at explicit time
RETURN promql('cpu_usage{host="srv1"}', 1700000000000)
-- Rate calculation
RETURN promql('rate(http_requests_total[5m])')
-- Scalar arithmetic
RETURN promql('2 + 3 * 4', 1000)
The function accepts 1-2 arguments: the PromQL expression (required) and an optional evaluation timestamp in milliseconds.
Prometheus Remote Read
Configure Prometheus to read from ArcadeDB for long-term storage:
# prometheus.yml
remote_read:
- url: "http://localhost:2480/ts/mydb/prom/read"
basic_auth:
username: root
password: password
The endpoint accepts the standard Prometheus remote_read Protobuf query (snappy-compressed) and supports =, !=, =~, !~ label matchers.
4.11.7. Continuous Aggregates
Continuous aggregates are pre-computed time-bucketed rollups that are automatically refreshed when new data is inserted. They dramatically speed up common dashboard queries by maintaining materialized summaries.
-- Create a continuous aggregate
CREATE CONTINUOUS AGGREGATE hourly_temps AS
SELECT ts.timeBucket('1h', ts) AS hour,
sensor_id,
avg(temperature) AS avg_temp,
max(temperature) AS max_temp,
count(*) AS cnt
FROM SensorReading
GROUP BY hour, sensor_id
-- Query the aggregate like any other type
SELECT * FROM hourly_temps
WHERE hour BETWEEN '2026-02-19' AND '2026-02-20'
-- Manual refresh
REFRESH CONTINUOUS AGGREGATE hourly_temps
-- Drop
DROP CONTINUOUS AGGREGATE hourly_temps
The defining query must reference a TimeSeries source type and contain a ts.timeBucket() call with a GROUP BY clause.
After creation, every committed insert into the source type triggers an incremental refresh using watermark tracking — only new data since the last watermark is processed.
Inspect continuous aggregates via:
SELECT FROM schema:continuousAggregates
This returns name, query, source type, bucket column, bucket interval, watermark timestamp, status, and metrics for each aggregate.
4.11.8. Retention Policies
Retention policies automatically delete data older than a specified duration. Set the retention period during type creation:
CREATE TIMESERIES TYPE SensorReading
TIMESTAMP ts
TAGS (sensor_id STRING)
FIELDS (temperature DOUBLE)
RETENTION 90 DAYS
A background maintenance scheduler (60-second interval) automatically enforces retention and downsampling policies on all TimeSeries types.
4.11.9. Downsampling Policies
Downsampling reduces the resolution of old data to save storage while preserving long-term trends. Multiple tiers can be defined to progressively reduce resolution as data ages:
ALTER TIMESERIES TYPE SensorReading
ADD DOWNSAMPLING POLICY
AFTER 7 DAYS GRANULARITY 1 MINUTES
AFTER 30 DAYS GRANULARITY 1 HOURS
In this example, data older than 7 days is downsampled to 1-minute resolution, and data older than 30 days is further reduced to 1-hour resolution. The downsampling process aggregates values using AVG within each granularity bucket, preserving tag groupings.
4.11.10. Grafana Integration
ArcadeDB provides Grafana DataFrame-compatible HTTP endpoints that work with the Grafana Infinity datasource plugin — no custom plugin is needed.
Endpoints:
| Endpoint | Description |
|---|---|
|
Datasource health check |
|
Discovers TimeSeries types, fields, tags, and available aggregation types |
|
Multi-target query returning Grafana DataFrame wire format |
The query endpoint supports raw queries, aggregated queries (SUM/AVG/MIN/MAX/COUNT), tag filtering, field projection, and automatic bucket interval calculation from Grafana’s maxDataPoints setting.
Grafana Infinity datasource configuration:
-
Install the Grafana Infinity datasource plugin.
-
Add a new Infinity datasource.
-
Set the base URL to
http://<arcadedb-host>:2480. -
Configure authentication (Basic Auth with ArcadeDB credentials).
-
Use the health endpoint for health checks.
4.11.11. Studio TimeSeries Explorer
The ArcadeDB Studio web interface includes a dedicated TimeSeries Explorer accessible from the main navigation sidebar.
The explorer provides four tabs:
-
Query — Time range selector, aggregation controls, field checkboxes, interactive charts (ApexCharts with zoom), data table with pagination, auto-refresh capability
-
Schema — Type introspection with column roles (TIMESTAMP/TAG/FIELD badges), diagnostics (total samples, shards, time range), configuration details, downsampling tiers, per-shard statistics
-
Ingestion — Documentation and examples for all four ingestion methods with a method comparison table
-
PromQL — PromQL expression input, instant/range toggle, time controls, chart and table rendering of results
4.11.12. HTTP API Reference
| Method | Endpoint | Description |
|---|---|---|
|
|
InfluxDB Line Protocol ingestion. Body: plain text lines. Returns 204 on success. |
|
|
JSON query endpoint. Supports raw and aggregated queries with tag filtering and field projection. |
|
|
Returns the most recent data point, with optional tag filter. |
|
|
PromQL instant query. |
|
|
PromQL range query. |
|
|
List all label names (metric names and tag columns). |
|
|
List distinct values for a label. |
|
|
Find series matching PromQL selector(s). |
|
|
Prometheus remote_write endpoint (snappy-compressed Protobuf). |
|
|
Prometheus remote_read endpoint (snappy-compressed Protobuf). |
|
|
Grafana datasource health check. |
|
|
Grafana metadata discovery. |
|
|
Grafana DataFrame query endpoint. |
4.11.13. High Availability
In an HA cluster, TimeSeries data is handled as follows:
-
Mutable bucket data (
.tstbfiles) is replicated to all followers via ArcadeDB’s standard page-replication protocol -
Sealed store data (
.ts.sealedfiles) is not replicated — each node compacts independently from its local mutable data -
Reads are consistent after failover: the engine queries both sealed and mutable layers, so a newly promoted leader returns correct results even before its first compaction cycle
-
Compaction lag: a node that has not yet compacted may serve reads slightly slower until the maintenance scheduler runs (default: 60 seconds)
4.11.14. Comparison with Other TimeSeries Databases
| Feature | ArcadeDB | InfluxDB 3 | TimescaleDB | Prometheus | QuestDB |
|---|---|---|---|---|---|
License |
Apache 2.0 |
MIT (core) |
Apache 2.0 (core) |
Apache 2.0 |
Apache 2.0 |
Multi-Model |
Graph + Document + K/V + TimeSeries + Vector in one engine |
TimeSeries only |
Relational + TimeSeries (PostgreSQL extension) |
Metrics only |
TimeSeries only (SQL) |
Query Languages |
SQL, PromQL, Cypher, Gremlin, GraphQL, MongoDB QL |
SQL, InfluxQL |
Full PostgreSQL SQL |
PromQL |
SQL (PG wire) |
Ingestion Protocols |
Line Protocol, Prometheus remote_write, SQL, Java API |
Line Protocol, SQL |
SQL (INSERT, COPY) |
Prometheus scrape, remote_write |
Line Protocol, SQL, CSV |
Compression |
Gorilla, Delta-of-Delta, Simple-8b, Dictionary |
Parquet native (Delta, Dict, Snappy/ZSTD) |
Gorilla, Delta-of-delta, Simple-8b, Dictionary, LZ4 |
Gorilla (~1.37 B/sample) |
ZFS-level + Parquet for cold tier |
Continuous Aggregates |
Yes (watermark-based, auto-refresh on commit) |
Materialized views |
Yes (policy-based refresh) |
Recording rules |
No |
Downsampling |
Yes (multi-tier, automatic) |
Via compaction |
Via continuous aggregates + retention |
Recording rules |
No |
Retention Policies |
Yes (automatic, per-type) |
Yes |
Yes (per-chunk) |
Yes (per-block) |
Yes (per-partition) |
Grafana Integration |
DataFrame endpoints (Infinity plugin) |
Native plugin |
PostgreSQL datasource |
Native plugin |
PostgreSQL datasource |
PromQL Support |
Native (parser + evaluator + HTTP endpoints) |
No |
Via adapter |
Native |
No |
Embeddable (in-process) |
Yes (Java library) |
No |
No (requires PostgreSQL) |
No |
No (separate process) |
SIMD Aggregation |
Yes (Java Vector API) |
Via DataFusion (Arrow) |
No |
No |
Yes (AVX2) |
Prometheus Remote Write/Read |
Yes |
No |
Via adapter |
Native |
No |
Graph + TimeSeries Queries |
Yes (native cross-model) |
No |
No |
No |
No |
4.11.15. Architecture
The TimeSeries engine uses a two-layer storage architecture:
-
Mutable bucket — An append-only in-memory buffer backed by ArcadeDB’s
PaginatedComponent. New samples land here first. This layer is ACID-transactional and replicated in HA mode. -
Sealed store — Immutable, compressed columnar blocks on disk. The maintenance scheduler periodically compacts mutable data into sealed blocks using Gorilla, Delta-of-Delta, Simple-8b, and Dictionary codecs. Block-level min/max/sum statistics enable zero-decompression aggregation when an entire block falls within a single time bucket.
Data is distributed across N shards (default: one per CPU core) for parallel writes and reads. Each shard maintains its own mutable bucket and sealed store. Queries merge results across shards using a min-heap priority queue sorted by timestamp.
See Also
-
Time Series Tutorial — Step-by-step hands-on guide to time series in ArcadeDB
-
Realtime Analytics — Use case for streaming ingestion and real-time dashboards
4.12. Creating Databases
There are several ways to create databases, where each has advantages in certain scenarios laid out next.
Console
The console tool provides a SQL-like statement which is not part of the general ArcadeDB SQL dialect (see SQL overview). Since console commands can be passed via command line, this way is useful when preparing a database via script, Dockerfile, or CI.
Studio
The studio tool provides a GUI to create (and manage) databases. This method is useful when experimenting or using an exploratory approach. Under the hood, the studio uses the HTTP API.
HTTP API
The HTTP server command can be used to create a database remotely.
This is useful, for example, when programmatically managing databases in containers or on virtual machines.
Java API
The create method of the DatabaseFactory class can be used when using ArcadeDB as an embedded database.
Server Argument
Lastly, the defaultDatabases setting allows to implicitly create a database when starting the server, as a specified default database that does not exists is created.
4.13. Multi-Model Architecture
ArcadeDB is a multi-model database that natively supports documents, graphs, key-value, time series, and vector search within a single engine. All models share the same storage layer, the same transaction manager, and the same query infrastructure. This design eliminates the need for polyglot persistence — the practice of running separate specialized databases for different data models — and the operational complexity that comes with it.
4.13.1. Why Multi-Model Matters
Traditional architectures often pair a relational database with a graph database, a document store, a time series backend, and a vector index. Each system brings its own deployment, its own backup strategy, its own security model, and its own failure modes. Keeping data consistent across these systems requires ETL pipelines, message queues, or change-data-capture infrastructure that adds latency and introduces synchronization bugs.
ArcadeDB collapses this stack into one process. A single ACID transaction can create a document, connect it as a vertex in a graph, and index its embedding for vector similarity — atomically. There is no replication lag between models because there is no replication; the data exists once.
4.13.2. The Latency Problem with Multiple Databases
With polyglot persistence, a single user request often triggers a chain of sequential calls across different databases. Each call adds network latency, serialization overhead, and wait time. The total response time is the sum of all calls.
| Polyglot Persistence (Sequential Calls) | Database | Latency |
|---|---|---|
Your App → MySQL |
Relational |
~15ms |
Wait… → MongoDB |
Document |
~20ms |
Wait… → Neo4j |
Graph |
~12ms |
Total: ~47ms+ (sum of all calls + serialization overhead) |
||
| ArcadeDB (Single Call) | Engine | Latency |
|---|---|---|
Your App → ArcadeDB (graph + doc + search in one query) |
Multi-Model |
~5ms |
Total: ~5ms (one call, zero chaining, no serialization) |
||
With a multi-model database, your application makes a single call that can query graphs, documents, and full-text search at the same time. No chaining, no waiting, no redundant serialization. The result? Dramatically lower latency and simpler application code.
ArcadeDB natively supports six data models: Graph, Document, Key-Value, Full-Text Search, Vector, and Time-Series — all within a single, embeddable database engine.
4.13.3. Supported Data Models
Documents
Documents are the foundational record type. They store data as JSON-like structures with nested properties, lists, and maps. Types can operate in schema-full mode (all properties declared), schema-less mode (no schema constraints), or schema-hybrid mode where some properties are enforced while others remain free-form. Document types support inheritance, allowing a subtype to extend a parent type and inherit its properties.
Graph
The graph model represents data as vertices and edges with properties. Edges are first-class citizens stored in their own types — not join tables or adjacency lists bolted onto a relational schema. ArcadeDB supports two kinds of edges: lightweight edges for simple relationships that carry no properties, and regular edges for relationships that need their own attributes such as weight, timestamp, or label. Graph traversal operates at O(1) complexity per hop through index-free adjacency.
Key-Value
Every record in ArcadeDB is accessible by its Record ID (RID) at O(1) complexity, making the entire database function as a key-value store. Buckets can also serve as dedicated key-value namespaces where records are inserted and retrieved by RID without requiring a type schema. This model is useful for caching, session storage, and any scenario where direct access by identifier is the primary pattern.
Time Series
The time series engine provides columnar storage with Gorilla XOR compression for floats and Delta-of-Delta encoding for timestamps, achieving 0.4 to 1.4 bytes per sample.
Ingestion supports the InfluxDB Line Protocol, making it compatible with Telegraf, Grafana Agent, and hundreds of collection agents.
Queries use PromQL for monitoring workloads or SQL analytical functions such as ts.timeBucket, ts.rate, and ts.percentile for custom analysis.
Vector Search
Vector search enables similarity-based retrieval using the LSMVectorIndex.
The index implements both HNSW and Vamana (DiskANN) algorithms from JVector, supports COSINE, DOT_PRODUCT, and EUCLIDEAN similarity functions, and offers INT8 and BINARY quantization for memory reduction.
Vector indexes participate in ACID transactions and integrate directly into SQL queries through functions like vectorNeighbors().
4.13.4. Unified Query Layer
All data models are accessible through multiple query languages that operate on the same underlying records:
-
SQL — The primary query language, extended with graph traversal, time series aggregation, and vector similarity functions
-
Cypher — Pattern-matching queries for graph traversal
-
Gremlin — Imperative graph traversal language (Apache TinkerPop)
-
GraphQL — Schema-driven queries over HTTP
-
MongoDB Query Language — Document-oriented queries for migration compatibility
-
Redis Protocol — Key-value operations over the Redis wire protocol
Because all languages read from and write to the same storage, a record created through SQL is immediately visible to a Gremlin traversal, and an edge created through Cypher can be queried with GraphQL.
4.13.5. Type Inheritance Across Models
ArcadeDB types follow an object-oriented inheritance model. A type can extend another type, inheriting all of its properties, indexes, and constraints. This applies uniformly to document types, vertex types, and edge types.
For example, a Person vertex type can serve as the parent for both Employee and Customer types.
A query against Person automatically includes records from all subtypes, while a query against Employee returns only employees.
This polymorphic behavior works identically regardless of which query language is used.
4.13.6. Schema Flexibility
ArcadeDB supports three schema modes that can be mixed within the same database:
-
Schema-full — Every property is declared with a type and optional constraints. Attempts to store undeclared properties are rejected.
-
Schema-less — No schema is defined. Records accept arbitrary properties with no validation.
-
Schema-hybrid — Some properties are declared and enforced, while additional properties can be added freely.
This flexibility allows different types in the same database to use different schema strategies.
A Transaction type might enforce strict schema-full validation, while a Metadata type might operate schema-less to accommodate evolving requirements.
4.13.7. Cross-Model Queries
The most powerful consequence of the multi-model architecture is the ability to combine models in a single query. A SQL statement can traverse graph edges, filter on document properties, aggregate time series data, and rank results by vector similarity — all in one pass, within one transaction.
SELECT FROM (
MATCH {type: User, as: u}-Follows->{type: User, as: f}
RETURN f
) WHERE vectorCosineSimilarity(f.embedding, ?) > 0.8
This query traverses the Follows graph to find users connected to a starting set, then filters the results to only those whose embedding vectors are semantically similar to a target.
No ETL, no cross-database joins, no eventual consistency — the graph traversal and the vector comparison execute against the same records in the same transaction.
4.14. Vector Search
ArcadeDB includes a native vector search engine for similarity-based retrieval of embeddings. Vector indexes are fully integrated into the SQL query engine and support ACID transactions, persistent storage, and automatic compaction.
4.14.1. How Vector Search Works
Vector search finds the nearest neighbors to a query vector in high-dimensional space. Instead of exact matching (like SQL WHERE), it finds the most similar items based on a distance or similarity metric.
Typical workflow:
-
Generate embeddings from your data using an external model (OpenAI, Sentence Transformers, etc.)
-
Store embeddings as vector properties on vertices or documents
-
Create a vector index on the property
-
Query with
vectorNeighbors()to find the k most similar items
4.14.2. LSMVectorIndex Architecture
ArcadeDB’s vector index is built on two foundations:
-
LSM Tree storage — ArcadeDB’s proven LSM Tree architecture provides persistent, crash-safe storage with automatic compaction
-
JVector 4.0.0 — A high-performance vector search library that implements both HNSW (Hierarchical Navigable Small World) and Vamana (DiskANN) graph algorithms
The index stores vectors as a navigable graph where each node connects to its approximate nearest neighbors. Searches traverse this graph, narrowing in on the closest matches efficiently — typically in O(log n) time rather than O(n) brute-force scanning.
4.14.3. Flat vs Hierarchical Structure
The index supports two graph structures:
| Flat (default) | Hierarchical | |
|---|---|---|
Algorithm |
Single-layer Vamana graph |
Multi-layer HNSW with exponential decay |
Build speed |
Faster |
10-20% slower |
Disk usage |
Baseline |
5-15% larger |
Best for |
< 100K vectors, well-clustered data |
100K+ vectors, 1024+ dimensions, diverse queries |
Enable hierarchical mode with addHierarchy: true in the index metadata.
4.14.4. Similarity Functions
Three distance metrics are available:
| Function | When to Use | Value Range |
|---|---|---|
COSINE (default) |
Text embeddings (BERT, GPT, Sentence Transformers). Direction matters, magnitude does not. |
-1 to 1 |
DOT_PRODUCT |
Normalized vectors where speed matters. 10-15% faster than COSINE. |
Unbounded |
EUCLIDEAN |
Spatial data, point clouds, continuous measurements. Absolute distance matters. |
0 to infinity |
| If your embeddings are already L2-normalized (unit vectors), use DOT_PRODUCT for best performance — it produces the same ranking as COSINE but skips the normalization step. |
4.14.5. Quantization
Quantization reduces memory usage by compressing vector components at the cost of slight accuracy loss:
| Type | Memory Reduction | Speed | Recall | Use Case |
|---|---|---|---|---|
NONE |
Baseline |
Baseline |
100% |
Small datasets (< 10K vectors), maximum accuracy |
INT8 (recommended) |
4x (75% savings) |
10-15% faster |
95-98% |
Best balance of speed and accuracy for most workloads |
BINARY |
32x (97% savings) |
15-20% faster |
85-92% |
Massive datasets, approximate search with reranking |
PRODUCT |
16-64x |
Approximate |
Varies |
Very large datasets (100K+), enables zero-disk-I/O graph construction |
| Use INT8 quantization for most use cases. It provides 4x memory savings with minimal accuracy loss and significantly faster search. Only use NONE for very small datasets where maximum precision matters. |
Why INT8 is faster
Quantization doesn’t just save memory — it fundamentally changes how vectors are read during search.
Without quantization (NONE), each node visited during graph traversal requires a full document lookup: read the record from disk, deserialize it, and extract the vector property. With INT8, vectors are stored in compact contiguous index pages and read directly — no document deserialization needed.
In benchmarks with 500K 384-dimensional vectors (matching the all-MiniLM-L6-v2 embedding model), INT8 reduces search latency by 2.5x compared to NONE:
| Quantization | Mean latency | p95 latency | Vector fetch path |
|---|---|---|---|
NONE |
3.50 ms |
4.36 ms |
Document lookup (random I/O) |
INT8 |
1.59 ms |
1.94 ms |
Index pages (sequential I/O) |
The difference becomes even more significant under memory pressure. With NONE quantization, vector data is 4x larger, evicting more data from memory caches and forcing real disk I/O. INT8 keeps the working set small enough to stay in memory even with constrained resources.
Quantization is transparent — queries work identically regardless of quantization setting. The index automatically quantizes on insert and dequantizes on retrieval.
When using PRODUCT quantization, the graph build uses Product Quantization scores instead of exact vector distances. PQ codes are compact and stay in memory, eliminating disk I/O during graph construction. This is most effective on large datasets (100K+ vectors) where PQ quality is sufficient.
| Quantization | Vector memory | Search speed |
|---|---|---|
NONE |
156 MB |
Baseline |
INT8 |
39 MB |
~2.5x faster |
BINARY |
5 MB |
~3x faster (lower recall) |
4.14.6. Key Parameters
| Parameter | Default | Purpose |
|---|---|---|
|
(required) |
Must match your embedding model output size |
|
|
Distance metric: COSINE, DOT_PRODUCT, or EUCLIDEAN |
|
|
Compression: NONE, INT8, BINARY, or PRODUCT. INT8 is recommended for most use cases. |
|
adaptive |
Search beam width at query time. Controls recall vs speed trade-off. See efSearch and Adaptive Search. |
|
16 |
Connections per node. Higher = better recall, more memory |
|
100 |
Search depth during build. Higher = better index quality, slower builds |
|
false |
Enable multi-layer HNSW for large/complex datasets |
|
false |
Co-locate vectors in graph file for faster retrieval at large scale |
|
true |
Build the HNSW graph immediately at index creation time. Set to |
4.14.7. efSearch and Adaptive Search
The efSearch parameter controls how many candidate nodes the search explores in the vector graph. Higher values find more accurate results but take longer.
When efSearch is not explicitly set (either on the index or per-query), ArcadeDB uses an adaptive two-pass strategy:
-
First pass — Uses a moderate beam width (
2 × k), which is sufficient for most queries on well-clustered data. -
Second pass — If the first pass returns insufficient results, the search automatically widens the beam to
10 × k.
For small indexes (< 10K vectors), the full default efSearch is always used since the cost is negligible.
This adaptive behavior gives you fast queries on easy lookups while still maintaining recall on harder queries — without requiring any tuning.
Setting efSearch
You can set efSearch at three levels:
Per-query (highest priority) — pass as the 4th argument to vectorNeighbors(), either positionally or via the named options map:
-- Higher efSearch for a critical query that needs maximum recall (positional)
SELECT expand(vectorNeighbors('Doc[embedding]', [...], 10, 500))
-- Lower efSearch for a latency-sensitive query (positional)
SELECT expand(vectorNeighbors('Doc[embedding]', [...], 10, 30))
-- Options map form (extensible; also supports `filter`)
SELECT expand(vectorNeighbors('Doc[embedding]', [...], 10, { efSearch: 500 }))
Per-index — set in the index metadata at creation time:
CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {
dimensions: 1024,
similarity: 'COSINE',
efSearch: 200
}
Adaptive (default) — when neither per-query nor per-index efSearch is specified, the adaptive strategy described above is used.
| For most workloads, the adaptive default works well. Only set efSearch explicitly if you need consistently high recall regardless of query difficulty, or if you have strict latency requirements. |
4.14.8. Filtered Search
Vector search can be combined with a logical filter on the same type by passing a filter option containing the allowed RIDs. The HNSW traversal restricts itself to that set, so non-matching vectors are skipped without decoding.
-- Find the 10 most similar documents within a specific tenant and category
SELECT vectorNeighbors(
'Document[embedding]',
:queryVector,
10,
{ filter: (SELECT @rid FROM Document WHERE tenantId = 'acme' AND category = 'finance') }
)
The filter value accepts a list of RIDs, RID strings, or any Identifiable. It can be produced by a subquery, a query parameter, or built programmatically.
Very selective filters (only a tiny fraction of records match) can starve the HNSW beam; combine filter with a higher efSearch to preserve recall.
|
4.14.9. Multi-Modal Search
A single vertex type can have multiple vector indexes on different properties:
CREATE INDEX ON Product (imageEmbedding) LSM_VECTOR METADATA {dimensions: 512, similarity: 'COSINE'}
CREATE INDEX ON Product (textEmbedding) LSM_VECTOR METADATA {dimensions: 768, similarity: 'COSINE'}
Query each index independently to search by image similarity, text similarity, or combine scores.
4.14.10. Integration with Other Models
Vector search combines naturally with ArcadeDB’s other data models:
-
Graph + Vectors — Find similar items, then traverse relationships to discover connected context (Graph RAG pattern)
-
Full-text + Vectors — Hybrid search combining keyword matching with semantic similarity (Knowledge Graph pattern)
-
Time Series + Vectors — Detect behavioral anomalies by comparing embedding patterns over time (Fraud Detection pattern)
4.14.11. Further Reading
-
Vector Search Tutorial — Step-by-step hands-on guide
-
Vector Embeddings How-To — Index creation, tuning, and best practices
-
Java Vector API — Programmatic vector index management
-
SQL Vector Functions — All 40+ vector SQL functions
4.15. High Availability
ArcadeDB is designed for continuous operation. Its high availability (HA) architecture ensures that your database remains accessible even when individual servers fail, providing both fault tolerance and horizontal read scalability.
This page explains the concepts behind ArcadeDB’s HA architecture. For step-by-step setup instructions, see HA Configuration.
4.15.1. Leader-Replica Replication Model
ArcadeDB uses a leader-replica replication model. At any point in time, a cluster has exactly one leader server and one or more replica servers.
-
The leader coordinates all write operations and distributes changes to replicas.
-
Replicas serve read requests (queries) independently, allowing the cluster to scale read throughput horizontally.
Clients can connect to any server in the cluster. If a client sends a write request to a replica, the replica transparently forwards it to the leader. This means applications do not need to distinguish between leader and replica servers — the cluster handles routing internally.
4.15.2. Server Roles and Election
Every server in the cluster operates in one of two roles:
- LEADER
-
The single server responsible for accepting and coordinating all write operations. Only one leader exists at any time.
- REPLICA
-
A server that maintains a copy of the data and serves read requests. Replicas receive changes from the leader and can be promoted to leader if the current leader fails.
When a server starts, it joins the cluster as a replica and participates in an election process. ArcadeDB uses the RAFT consensus protocol for leader election, which guarantees that the cluster agrees on exactly one leader at all times.
If the cluster already has a healthy leader, the new server simply joins as a replica. If no leader exists — for example, during initial cluster formation or after a leader failure — an election determines which server becomes the new leader.
4.15.3. Replication and the Write-Ahead Journal
The leader replicates changes to replicas through a journal-based mechanism. Each server maintains its own journal (write-ahead log) that records all modifications.
When the leader processes a write operation, it:
-
Records the change in its local journal.
-
Sends the change to all replica servers.
-
Waits for a quorum of replicas to acknowledge the change (see below).
-
Commits the transaction and returns the result to the client.
The journal also plays a critical role in recovery. When a server rejoins the cluster after a failure, it uses the journal to identify the most up-to-date replica and to realign any servers that fell behind.
4.15.4. Write Quorum and Consistency
ArcadeDB uses a configurable write quorum to control the trade-off between consistency and availability. The quorum defines how many servers must acknowledge a write before it is considered committed.
The default quorum is MAJORITY, meaning more than half of the servers in the cluster must confirm the write.
This default provides a strong balance: it ensures data durability across multiple servers while tolerating the failure of a minority of nodes.
Other quorum options include:
-
none— the leader commits immediately without waiting for any replica (fastest, least durable). -
A specific number (
1,2,3, etc.) — waits for exactly that many acknowledgments. -
all— waits for every server in the cluster to acknowledge (strongest durability, highest latency).
If the quorum cannot be met — for example, because too many replicas are unavailable — the transaction is rolled back on all servers and an error is returned to the client. This behavior protects data integrity by preventing partially committed writes.
For more on how transactions interact with replication, see Transactions. For quorum and other cluster settings, see Server Settings.
4.15.5. Automatic Failover
ArcadeDB handles leader failure automatically. When replicas detect that the leader is no longer reachable, they initiate a new election using the RAFT protocol. The replica with the most up-to-date journal is elected as the new leader, and the cluster resumes normal operation.
This failover process is transparent to clients. After a brief interruption during the election, clients reconnect to the new leader and continue operating without manual intervention.
Common causes of leader unavailability include process termination, server shutdown or reboot, and network partitions that isolate the leader from the rest of the cluster.
4.15.6. Split-Brain Protection
A split-brain occurs when a network partition divides a cluster into two or more groups, each believing it should operate independently. This can lead to conflicting writes and data divergence.
ArcadeDB prevents split-brain scenarios through its majority quorum requirement. Because a leader must receive acknowledgment from a majority of servers to commit writes, at most one partition can ever hold a majority. Any minority partition cannot elect a leader or commit writes, so it effectively becomes read-only until the network heals.
This design guarantees that the cluster never produces conflicting committed data, even during network failures.
4.15.7. Cluster Discovery and Configuration
Servers discover each other through a configured list of seed servers. At startup, each server contacts the addresses in this list to find and join an existing cluster, or to form a new one if no cluster exists yet.
By default, if no server list is configured, ArcadeDB uses auto-discovery to locate other servers on the local network. You can also run multiple independent clusters on the same network by assigning each cluster a unique name.
For detailed configuration options and deployment instructions, see HA Configuration.
4.17. LSM-Tree Algorithm
ArcadeDB’s default index algorithm is the LSM Tree. For equality-only lookups, ArcadeDB also provides a Hash Index based on extendible hashing.
4.17.1. Quick Overview
The LSM-Tree index is optimized for writes with a complexity of O(1) in writing because it does not require a re-balancing of the tree (like the B+Tree) and O(log(N)) to O(log(N)^2) with reads, depending on how fragmented (non compacted) the index is at a point of time.
The class LSMTreeIndex is the main entrypoint for the index. This class contains an instance of a LSMTreeIndexMutable class that manages the current mutable index and a subIndex in case there is a compacted index connected.
ArcadeDB uses pages to store index entries. Those pages are immutable once full and cannot be changed. This is why it is considered an append-only index. After a while, ArcadeDB runs the compactor that compacts the index by reading the content of the pages from disk, compresses the content and write them back into new pages to disk. This process happens in background while the index can be used by the application.
The LSMTreeIndexMutable class holds the current page in RAM so new entries can be quickly appended to the page. When the database is closed or when the page is full, the page is serialized to disk and becomes immutable. Since the pages are immutable, deletion are managed with special placeholders to mark the entries as removed.
Since the LSM-Tree is append only, the most updated entries are at the end of the index. If an entry is created and then deleted, the deletion will always be after the creation. For this reason cursors start from the last pages back to the first one available. While the cursor is browsing the pages back, it keeps track of the deleted values encountered.
Compaction
During compaction, new compressed version of the pages are stored.
The pages are stored in segments where the root page is the first page of the segment.
A file can have multiple segments.
The compaction requires a lot of RAM to properly compact large indexes and make them efficient.
To control the amount of RAM the compaction task is using, the setting arcadedb.indexCompactionRAM is used to determine the maximum amount of RAM allowed to use for compaction.
By default is 300MB of RAM.
Also, the setting arcadedb.indexCompactionMinPagesSchedule tells the index when it is time for scheduling a compaction.
By default the minimum is 10 pages not compacted.
If you set 0, the automatic compaction is disabled.
If, for example, the compaction task finds 20 pages to compact, but the RAM requirements allow only 10 pages to be compacted, then the task will process the first 10 pages in one segment and the remaining 10 pages in the following segment. Having multiple segments in the same file allows to run the compaction with a minimum amount of RAM if needed.
Page layout
There are 2 types of pages: root page and data page. The pages are populated from the head of the page by storing the pointers to the pair of key/value that are written from the tail. A page is full when there is no space between the head (key pointers) and the tail (key/value pairs). When a page is full, another page is created, waiting for a compaction.
Root Page
It’s the first page of a segment of pages. The header contains more information than the Data Page (see below). Both root pages and data pages store keys. With mutable indexes the only difference is that the root page has a larger header. For compacted indexes, the root page does not store entries, but contains pointers to the data pages with the first key indexed on the page of the current segment.
Header:
[offsetFreeKeyValueContent(int:4),numberOfEntries(int:4),mutable(boolean:1),compactedPageNumberOfSeries(int:4),subIndexFileId(int:4),numberOfKeys(byte:1),keyType(byte:1)*]
-
offsetFreeKeyValueContent is the offset in the page of free space to store key/value pairs. When a page is new, the offset points to the end of the page because the pairs are filled from the end to the beginning
-
numberOfEntries number of entries (pairs) in the current page
-
mutable 1 means mutable, 0 immutable. Immutable pages cannot be modified, only compacted and then removed at the end of compaction cycle
-
compactedPageNumberOfSeries number of pages in the compacted segment. This is stored in the last page of the segment and it is used to count the pages of the current segment and therefore finding the root page of the segment
-
subIndexFileId file id of compacted page segment
-
numberOfKeys number of keys. If you’re using a composite index, multiple keys are used, otherwise only 1
-
keyType an array of key types. If the numberOfKeys is 3, then 3 bytes for keyType are written
Data Page
Header:
[offsetFreeKeyValueContent(int:4),numberOfEntries(int:4),mutable(boolean:1),compactedPageNumberOfSeries(int:4)]
Look the root page for the meaning of the header fields.
Mutable Structure
Before being compacted, the LSM-Tree index is conceptually simple: new entries are always inserted in the last page of the file that is the only mutable page of the index. If the page is full, a new mutable page is created and the old mutable page become immutable and stored on disk. The entries in the page are written already ordered by key. This means a cursor can just browse the page from the first entry to the last to have ordered items. The search of a key is executed with the dichotomy algorithm. Pages can have entries with the same key, because have been added later in the index. On writing, the order of the keys is maintained inside the same page. This means to have a true ordered iterator, a virtual iterator is created with pointers to all the pages. Every time you move forward in the iterator, the next available key in the page pointer is compared among each other and the minor is returned. This can become quite expensive in the case of millions of entries and hundreds of pages. For this reason a periodic compaction is needed to keep values ordered together in the same page and also to remove deleted items.
5. How-To Guides
5.1. Data Modeling
5.1.1. Full-Text Index
ArcadeDB’s full-text index is built on Apache Lucene and supports configurable analyzers, advanced Lucene query syntax, multi-property indexing, relevance scoring, and "More Like This" similarity search.
Creating a Full-Text Index
CREATE INDEX ON Article (content) FULL_TEXT
Multiple properties can be indexed together:
CREATE INDEX ON Article (title, body) FULL_TEXT
Configuring the Analyzer
Pass a METADATA block to choose a Lucene analyzer:
CREATE INDEX ON Article (content) FULL_TEXT
METADATA {
"analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer",
"allowLeadingWildcard": false,
"defaultOperator": "OR"
}
Metadata Options
| Option | Type | Default | Description |
|---|---|---|---|
|
string |
|
Lucene analyzer class used for both indexing and querying |
|
string |
— |
Override analyzer used only at index time |
|
string |
— |
Override analyzer used only at query time |
|
boolean |
|
Allow |
|
|
|
Default operator between terms when none is specified |
|
string |
— |
Per-field analyzer override, e.g. |
Common Analyzers
| Analyzer Class | Description |
|---|---|
|
General-purpose tokenizer (default) |
|
English stemming and stop words |
|
Lowercase only, no stop words |
|
Split on whitespace only |
Per-Field Analyzers
For multi-property indexes, each field can use a different analyzer:
CREATE INDEX ON Article (title, body) FULL_TEXT
METADATA {
"analyzer": "org.apache.lucene.analysis.standard.StandardAnalyzer",
"title_analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer"
}
Here title uses the English analyzer while body falls back to the standard analyzer.
Searching
SEARCH_INDEX()
Searches a named full-text index using Lucene query syntax.
SELECT * FROM Article
WHERE SEARCH_INDEX('Article[content]', 'java programming')
Signature: SEARCH_INDEX(indexName, query)
| Parameter | Type | Description |
|---|---|---|
|
string |
The full index name as shown in the schema, e.g. |
|
string |
A Lucene query string |
SEARCH_FIELDS()
Finds the full-text index automatically from field names, without needing to know the index name.
SELECT * FROM Article
WHERE SEARCH_FIELDS(['title', 'body'], 'database tutorial')
Signature: SEARCH_FIELDS(fieldNames, query)
| Parameter | Type | Description |
|---|---|---|
|
array of strings |
Fields to search; a full-text index covering these fields must exist |
|
string |
A Lucene query string |
Lucene Query Syntax
Both SEARCH_INDEX and SEARCH_FIELDS accept standard Lucene query syntax.
Boolean Operators
| Syntax | Meaning |
|---|---|
|
Either term (OR, default) |
|
Both terms required (AND) |
|
|
|
Explicit AND |
|
Explicit OR |
-- Requires both terms
SELECT * FROM Article WHERE SEARCH_INDEX('Article[content]', '+java +programming')
-- Excludes documents about python
SELECT * FROM Article WHERE SEARCH_INDEX('Article[content]', 'java -python')
Phrase Queries
Wrap a phrase in double quotes to require terms to appear in order:
SELECT * FROM Article WHERE SEARCH_INDEX('Article[content]', '"machine learning"')
Wildcard Queries
| Syntax | Matches |
|---|---|
|
database, datastore, dataset… |
|
database, datXbase… |
|
Requires |
SELECT * FROM Article WHERE SEARCH_INDEX('Article[content]', 'data*')
Fuzzy Queries
Append ~ to match terms within an edit distance:
SELECT * FROM Article WHERE SEARCH_INDEX('Article[content]', 'database~')
-- also matches terms within edit distance 2 (Lucene default), e.g. "databaSe", "databasee"
Field-Qualified Queries (Multi-Property Indexes)
For indexes over multiple fields, restrict a term to a specific field:
-- Only match "database" in the title field
SELECT * FROM Article WHERE SEARCH_INDEX('Article[title,body]', 'title:database')
-- Combine field-specific and general terms
SELECT * FROM Article WHERE SEARCH_INDEX('Article[title,body]', '+title:"multi model" -nosql')
Relevance Score ($score)
Every match carries a relevance score.
Use $score in projections or ORDER BY clauses:
SELECT title, $score
FROM Article
WHERE SEARCH_INDEX('Article[content]', 'java programming')
ORDER BY $score DESC
Documents that match more query terms receive higher scores.
SELECT title, $score AS relevance
FROM Article
WHERE SEARCH_FIELDS(['content'], 'java programming')
ORDER BY relevance DESC
LIMIT 10
More Like This
"More Like This" finds documents similar to one or more source documents. It extracts representative terms from the sources, then searches for other documents sharing those terms.
|
The four full-text search functions ( |
fulltext.searchIndexMore()
SELECT title, $score, $similarity
FROM Article
WHERE `fulltext.searchIndexMore`('Article[content]', [#10:3])
ORDER BY $similarity DESC
Signature: fulltext.searchIndexMore(indexName, sourceRIDs [, options]) (alias: search_index_more)
| Parameter | Type | Description |
|---|---|---|
|
string |
Full-text index name |
|
array of RIDs |
One or more source documents |
|
map |
Optional MLT tuning options (see More Like This Configuration). Unknown keys are rejected. |
fulltext.searchFieldsMore()
Same as fulltext.searchIndexMore but resolves the full-text index automatically from field names:
SELECT title, $similarity
FROM Article
WHERE `fulltext.searchFieldsMore`(['content'], [#10:3])
ORDER BY $similarity DESC
Signature: fulltext.searchFieldsMore(fieldNames, sourceRIDs [, options]) (alias: search_fields_more)
| Parameter | Type | Description |
|---|---|---|
|
array of strings |
Fields to search; a full-text index covering these fields must exist |
|
array of RIDs |
One or more source documents |
|
map |
Optional MLT tuning options (see More Like This Configuration). Unknown keys are rejected. |
Multiple Source Documents
Provide multiple RIDs to find documents similar to a combination of sources:
SELECT title, $similarity
FROM Article
WHERE SEARCH_INDEX_MORE('Article[content]', [#10:3, #10:4])
ORDER BY $similarity DESC
Similarity Score ($similarity)
$similarity is a normalized score from 0.0 to 1.0.
The most similar document in the result set always receives 1.0.
SELECT title, $score, $similarity
FROM Article
WHERE SEARCH_INDEX_MORE('Article[content]', [#10:3])
ORDER BY $similarity DESC
LIMIT 5
More Like This Configuration
Pass an optional map of tuning options. Unknown keys are rejected with a descriptive error to catch typos; keys are case sensitive.
SELECT title, $similarity
FROM Article
WHERE `fulltext.searchFieldsMore`(['title', 'content'], [#10:3], {
minTermFreq: 1,
minDocFreq: 3,
maxQueryTerms: 50,
excludeSource: false
})
| Option | Type | Default | Description |
|---|---|---|---|
|
int |
|
Minimum times a term must appear in the source document(s) to be considered |
|
int |
|
Minimum number of index documents that must contain a term for it to be used |
|
float |
|
Exclude terms appearing in more than this fraction of all documents (e.g. |
|
int |
|
Maximum number of terms to use in the similarity query |
|
int |
|
Ignore terms shorter than this length (0 = no minimum) |
|
int |
|
Ignore terms longer than this length (0 = no maximum) |
|
boolean |
|
Weight terms by TF-IDF score rather than raw frequency |
|
boolean |
|
Exclude source documents from results |
|
int |
|
Maximum number of source RIDs allowed |
Practical Examples
Blog Search with Ranking
SELECT title, author, $score AS relevance
FROM BlogPost
WHERE SEARCH_INDEX('BlogPost[title,body]', '+java +spring -legacy')
ORDER BY relevance DESC
LIMIT 20
Autocomplete with Prefix Wildcard
SELECT title
FROM Product
WHERE SEARCH_INDEX('Product[name]', 'micro*')
Stemming with English Analyzer
With EnglishAnalyzer, searching for "run" also matches "running", "runs", and "ran":
CREATE INDEX ON Article (content) FULL_TEXT
METADATA { "analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer" }
SELECT * FROM Article WHERE SEARCH_INDEX('Article[content]', 'running')
-- also returns documents containing "run", "runs", "ran"
Notes
-
Full-text indexes require all indexed properties to be of type
STRING. -
Full-text indexes cannot be marked
UNIQUE. -
Without a
METADATAblock, indexes useStandardAnalyzerwithORdefault operator and leading wildcards disabled. -
Indexes created without metadata continue to work with
SEARCH_INDEXandSEARCH_INDEX_MOREexactly as before. -
$scoreand$similarityare always available as query variables for matching documents; non-matching documents receive0.
5.1.2. Geospatial Index
ArcadeDB’s geospatial index enables spatial queries directly from SQL using the geo.* function namespace.
Geometries are stored as Well-Known Text (WKT) strings in regular STRING properties — no special column type is required.
The index is built on ArcadeDB’s LSM-Tree storage engine, which means ACID guarantees, write-ahead logging, high availability replication, and automatic compaction all apply without any extra configuration.
Internally, the index decomposes each geometry into GeoHash tokens using a RecursivePrefixTreeStrategy, stores each token in the underlying LSM-Tree, and applies an exact Spatial4j predicate as a post-filter when queries are evaluated.
Creating a Geospatial Index
Create a STRING property, then add a GEOSPATIAL index to it:
CREATE VERTEX TYPE Location
CREATE PROPERTY Location.coords STRING
CREATE INDEX ON Location (coords) GEOSPATIAL
The index type keyword is GEOSPATIAL.
Only STRING properties are supported (WKT values are stored as strings).
Configuring Precision
The precision level controls the GeoHash grid resolution used to decompose geometries. Higher precision means more tokens per geometry, finer resolution, and a larger index.
CREATE INDEX ON Location (coords) GEOSPATIAL
METADATA { "precision": 9 }
| Precision | Cell size (approx.) | Typical use |
|---|---|---|
5 |
~4.9 km × 4.9 km |
Country / region level |
7 |
~153 m × 153 m |
City / neighborhood level |
9 |
~4.8 m × 4.8 m |
Street-address precision |
11 (default) |
~2.4 m × 2.4 m |
High precision — building footprints, GPS tracks |
Changing precision after data has been indexed requires rebuilding the index with REBUILD INDEX.
|
Inserting Geometries
Store any valid WKT string in the indexed property.
Pass it as a plain string literal or use the geo.asText() function to convert a geometry object back to WKT:
-- Insert a point by WKT literal
INSERT INTO Location SET name = 'Eiffel Tower', coords = 'POINT(2.2945 48.8584)'
-- Insert a polygon by WKT literal
INSERT INTO Location SET name = 'Paris Centre', coords =
'POLYGON((2.2945 48.8584, 2.3522 48.8566, 2.3488 48.8791, 2.2945 48.8584))'
-- Use geo.asText() to store a constructed geometry
INSERT INTO Location SET name = 'Arc de Triomphe',
coords = geo.asText(geo.point(2.2950, 48.8738))
Records with null or unparsable WKT values in the indexed property are silently skipped during indexing.
Querying with Spatial Predicates
Use any geo.* predicate in a WHERE clause.
When a geospatial index exists on the referenced field, the query planner automatically uses it — no hint or special syntax is required.
-- All locations within a bounding polygon
SELECT name FROM Location
WHERE geo.within(coords, geo.geomFromText('POLYGON((2.28 48.85, 2.40 48.85, 2.40 48.89, 2.28 48.89, 2.28 48.85))')) = true
-- All locations that intersect a line
SELECT name FROM Location
WHERE geo.intersects(coords, geo.geomFromText('LINESTRING(2.29 48.85, 2.35 48.87)')) = true
-- All locations within 500 m of a point (full-scan fallback — see note)
SELECT name FROM Location
WHERE geo.dWithin(coords, geo.geomFromText('POINT(2.2945 48.8584)'), 0.0045) = true
geo.dWithin and geo.disjoint always perform a full scan because their semantics cannot be efficiently mapped to the GeoHash candidate set.
All other predicates use the geospatial index automatically.
|
How Index-Accelerated Queries Work
-
The SQL query planner detects a spatial predicate function implementing
IndexableSQLFunctionin theWHEREclause. -
It calls
allowsIndexedExecution(), which returnstruewhen the first argument is a bare field reference and aGEOSPATIALindex exists on that field. -
The index decomposes the query shape into GeoHash cells and looks up candidate record IDs in the LSM-Tree.
-
The predicate function’s exact Spatial4j implementation post-filters the candidates to remove false positives (GeoHash cells are rectangular approximations).
The result is correct — the index is a superset filter, and the exact predicate is always applied afterward.
geo.* Functions Reference
Geospatial functions are organised into two groups: constructor / accessor functions (pure computation, no index) and spatial predicate functions (used in WHERE clauses, index-accelerated where possible).
All functions accept either WKT strings or geometry objects produced by other geo.* functions.
All predicate functions return null when either argument is null (SQL three-valued logic).
geo.geomFromText()
Parses any WKT string into a geometry object that can be passed to other geo.* functions.
Syntax: geo.geomFromText(<wkt-string>)
SELECT geo.geomFromText('POINT(2.2945 48.8584)') AS geom
SELECT geo.geomFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))') AS geom
Throws IllegalArgumentException for malformed WKT.
geo.point()
Creates a 2D point from longitude and latitude (or X and Y) coordinates.
Syntax: geo.point(<x>, <y>)
SELECT geo.point(2.2945, 48.8584) AS pt
geo.lineString()
Creates a line string from a list of points.
Syntax: geo.lineString([<point>*])
SELECT geo.lineString([geo.point(0, 0), geo.point(10, 10), geo.point(20, 0)]) AS line
geo.polygon()
Creates a polygon from an ordered list of points. The first and last point must be identical to close the ring.
Syntax: geo.polygon([<point>*])
SELECT geo.polygon([
geo.point(0, 0), geo.point(10, 0),
geo.point(10, 10), geo.point(0, 10),
geo.point(0, 0)
]) AS poly
geo.buffer()
Returns a new geometry that is a buffer of the given distance (in degrees) around the input geometry.
Syntax: geo.buffer(<geometry>, <distance-in-degrees>)
-- Buffer 0.01 degrees (~1.1 km) around a point
SELECT geo.buffer(geo.point(2.2945, 48.8584), 0.01) AS buffered
| Distance is expressed in degrees. To approximate metres, divide by 111,000 (1 degree ≈ 111 km at the equator). |
geo.envelope()
Returns the minimum bounding rectangle (envelope) of a geometry as a polygon WKT.
Syntax: geo.envelope(<geometry>)
SELECT geo.envelope(geo.geomFromText('LINESTRING(0 0, 10 5, 20 0)')) AS bbox
geo.distance()
Returns the distance between two points using the Haversine formula (great-circle distance).
Syntax: geo.distance(<geometry1>, <geometry2> [, <unit>])
The optional unit parameter accepts 'km' (default) or 'm'.
-- Distance in kilometres between two points on a record
SELECT geo.distance(coords, geo.geomFromText('POINT(2.3522 48.8566)')) AS dist_km
FROM Location
-- Distance in metres
SELECT geo.distance(geo.point(0, 0), geo.point(1, 1), 'm') AS dist_m
geo.area()
Returns the area of a geometry in square degrees.
Syntax: geo.area(<geometry>)
SELECT geo.area(geo.geomFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))')) AS area
geo.asText()
Converts a geometry object to its WKT string representation.
Syntax: geo.asText(<geometry>)
SELECT geo.asText(geo.point(2.2945, 48.8584)) AS wkt
-- Returns: 'POINT (2.2945 48.8584)'
geo.asGeoJson()
Converts a geometry object to a GeoJSON string.
Syntax: geo.asGeoJson(<geometry>)
SELECT geo.asGeoJson(geo.point(2.2945, 48.8584)) AS geojson
-- Returns: '{"type":"Point","coordinates":[2.2945,48.8584]}'
geo.x()
Extracts the X coordinate (longitude) from a point geometry.
Syntax: geo.x(<point>)
SELECT geo.x(geo.geomFromText('POINT(2.2945 48.8584)')) AS longitude
-- Returns: 2.2945
geo.y()
Extracts the Y coordinate (latitude) from a point geometry.
Syntax: geo.y(<point>)
SELECT geo.y(geo.geomFromText('POINT(2.2945 48.8584)')) AS latitude
-- Returns: 48.8584
Spatial Predicate Functions
All predicate functions accept a field reference or WKT string as the first argument and a geometry (from any geo.* constructor) as the second.
When a GEOSPATIAL index exists on the referenced field, the query planner uses it automatically.
| Function | Semantics | Index used? |
|---|---|---|
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
No — full scan |
|
|
No — full scan |
geo.disjoint cannot use the index because the index stores records that intersect indexed cells; records absent from the index are exactly those that are disjoint.
geo.dWithin falls back to a full scan because correct proximity indexing requires expanding the query shape into a bounding circle first — planned as a future enhancement.
|
geo.within()
Returns true if the first geometry is fully within the second.
SELECT name FROM Location
WHERE geo.within(coords, geo.geomFromText(
'POLYGON((2.28 48.84, 2.42 48.84, 2.42 48.90, 2.28 48.90, 2.28 48.84))'
)) = true
geo.intersects()
Returns true if the geometries share at least one point.
SELECT name FROM Location
WHERE geo.intersects(coords, geo.geomFromText('LINESTRING(0 0, 10 10)')) = true
geo.contains()
Returns true if the first geometry fully contains the second.
SELECT name FROM Zone
WHERE geo.contains(boundary, geo.geomFromText('POINT(2.2945 48.8584)')) = true
geo.dWithin()
Returns true if the first geometry is within the given distance (in degrees) of the second.
-- All locations within 0.01 degrees (~1.1 km) of the Eiffel Tower
SELECT name FROM Location
WHERE geo.dWithin(coords, geo.geomFromText('POINT(2.2945 48.8584)'), 0.01) = true
geo.disjoint()
Returns true if the geometries share no points.
SELECT name FROM Location
WHERE geo.disjoint(coords, geo.geomFromText(
'POLYGON((0 0, 5 0, 5 5, 0 5, 0 0))'
)) = true
geo.equals()
Returns true if the geometries are geometrically identical.
SELECT * FROM Location
WHERE geo.equals(coords, geo.geomFromText('POINT(2.2945 48.8584)')) = true
geo.crosses()
Returns true if the geometries cross (share some but not all interior points, and the intersection is of lower dimension than the geometries).
SELECT name FROM Road
WHERE geo.crosses(path, geo.geomFromText('LINESTRING(0 5, 10 5)')) = true
Practical Examples
Points of Interest Near a Landmark
-- All POIs within a 500 m bounding box of the Eiffel Tower
SELECT name, coords
FROM PointOfInterest
WHERE geo.within(coords, geo.buffer(geo.point(2.2945, 48.8584), 0.0045)) = true
ORDER BY geo.distance(coords, geo.geomFromText('POINT(2.2945 48.8584)'))
LIMIT 20
Delivery Zones Containing an Address
SELECT zone_name, carrier
FROM DeliveryZone
WHERE geo.contains(boundary, geo.geomFromText('POINT(2.3522 48.8566)')) = true
Fleet Vehicles Inside a Geofence
SELECT vehicle_id, last_seen
FROM Vehicle
WHERE geo.within(last_position,
geo.geomFromText('POLYGON((2.28 48.85, 2.42 48.85, 2.42 48.90, 2.28 48.90, 2.28 48.85))')
) = true
AND last_seen > date('2026-01-01', 'yyyy-MM-dd')
Notes and Limitations
-
Geometries must be stored as WKT strings in
STRINGproperties. No dedicated geometry column type is introduced; WKT is the only supported format at storage level. -
GeoJSON is not a storage format, but
geo.asGeoJson()converts any geometry to GeoJSON for output. -
3D geometry and raster data are not supported.
-
The antimeridian and polar edge cases are handled correctly by the underlying GeoHash grid.
-
Changing the precision level of an existing geospatial index requires rebuilding the index:
REBUILD INDEX `Location[coords]`
-
A
GEOSPATIALindex cannot be markedUNIQUE.
For related commands, see:
5.1.3. Materialized Views
A materialized view is a schema-level object that stores the result of a SQL SELECT query as a backing document type.
Unlike regular views (which re-execute the query on every access), a materialized view holds a pre-computed snapshot of data that can be queried directly for fast reads.
A materialized view:
-
Wraps a SQL
SELECTquery as its defining query -
Stores results in a backing
DocumentTypewith standard buckets -
Supports three refresh modes: manual, incremental (post-commit), and periodic (scheduled)
-
Persists its definition and metadata in
schema.jsonalongside other schema objects -
Can be created, dropped, refreshed, and altered via SQL DDL statements or the Java Schema API
Refresh Modes
MANUAL
The view data is never automatically updated.
You must trigger a refresh explicitly via REFRESH MATERIALIZED VIEW or the Java API.
Use this when you control refresh timing yourself or when source data changes infrequently.
INCREMENTAL
After every committed transaction that modifies a source type, ArcadeDB automatically refreshes the view in a post-commit callback. The refresh is:
-
Transactionally safe: runs after the source transaction commits successfully; rolled-back transactions do not trigger a refresh
-
Batched per transaction: multiple record changes in a single transaction result in one refresh, not one per record
-
For simple queries (single source type, no aggregates, no
GROUP BY, no subqueries, no `JOIN`s): performs a full refresh -
For complex queries (aggregates,
GROUP BY, etc.): also performs a full refresh (per-record incremental optimization is planned for a future release)
If the refresh fails, the view is marked STALE and a warning is logged.
A manual refresh can recover it.
PERIODIC
A background scheduler thread runs a full refresh at the specified interval after each successful refresh completes. Intervals are specified in seconds, minutes, or hours:
REFRESH EVERY 30 SECOND
REFRESH EVERY 5 MINUTE
REFRESH EVERY 1 HOUR
The scheduler uses a single daemon thread (ArcadeDB-MV-Scheduler) shared across all periodic views.
If the database is closed, all scheduled tasks are cancelled automatically.
View Status
Each view tracks a status field that reflects its current state:
| Status | Meaning |
|---|---|
|
Data is up to date with the last refresh |
|
A refresh failed or was interrupted; data may be outdated |
|
A refresh is currently in progress |
|
The last refresh encountered a fatal error |
If the database crashes while a view is BUILDING, the status is reset to STALE on the next startup to signal that the data may be incomplete.
Querying a Materialized View
Query a materialized view exactly like any other document type:
SELECT * FROM ActiveUsers
SELECT name FROM ActiveUsers WHERE name LIKE 'A%'
SELECT count(*) FROM RecentOrders
Java API
Creating a view
database.transaction(() -> {
database.getSchema().buildMaterializedView()
.withName("ActiveUsers")
.withQuery("SELECT name, email FROM User WHERE active = true")
.withRefreshMode(MaterializedViewRefreshMode.MANUAL)
.create();
});
Builder options:
| Method | Description |
|---|---|
|
Name for the view (required) |
|
Defining SQL |
|
|
|
Number of buckets for the backing type |
|
Page size for the backing type |
|
Interval in milliseconds for |
|
When |
Querying schema
Schema schema = database.getSchema();
// Check existence
boolean exists = schema.existsMaterializedView("ActiveUsers");
// Get a specific view
MaterializedView view = schema.getMaterializedView("ActiveUsers");
// List all views
MaterializedView[] views = schema.getMaterializedViews();
Refreshing and dropping
// Programmatic refresh
database.getSchema().getMaterializedView("ActiveUsers").refresh();
// Drop via schema
database.getSchema().dropMaterializedView("ActiveUsers");
// Drop via the view itself
database.getSchema().getMaterializedView("ActiveUsers").drop();
Inspecting a view
MaterializedView view = database.getSchema().getMaterializedView("HourlySummary");
view.getName(); // "HourlySummary"
view.getQuery(); // the defining SQL query
view.getRefreshMode(); // MaterializedViewRefreshMode.PERIODIC
view.getStatus(); // "VALID", "STALE", "BUILDING", or "ERROR"
view.getLastRefreshTime(); // epoch millis of last successful refresh
view.isSimpleQuery(); // true if eligible for per-record optimization
view.getSourceTypeNames(); // list of source type names parsed from the query
view.getBackingType(); // the underlying DocumentType
Behavior and Constraints
-
Backing type protection: You cannot
DROP TYPEon a type that backs a materialized view. Drop the materialized view first. -
Name uniqueness: The view name must not match any existing type or materialized view.
-
Source type validation: All types referenced in the
FROMclause must exist when the view is created. -
Persistence: View definitions are stored in
schema.jsonunder a"materializedViews"key and survive database restarts. Listener registration forINCREMENTALviews and scheduler tasks forPERIODICviews are re-established on startup. -
Transaction safety: The initial full refresh and all subsequent refreshes run inside their own transactions.
-
Query result columns: Only non-internal properties (those not starting with
@) are copied into the backing type during refresh. -
No schema on backing type: The backing document type is schema-less; property types are not enforced.
Error Handling
-
If a post-commit refresh fails (
INCREMENTALmode), the view is markedSTALEand aWARNINGis logged. The source transaction is unaffected. -
If a periodic refresh fails, the view is marked
ERRORand aSEVERElog entry is written. The scheduler continues running and will retry on the next interval. -
Callback errors in the transaction callback system are logged at
WARNINGlevel and do not affect the triggering transaction or other callbacks.
Limitations
-
ALTER MATERIALIZED VIEWis not yet implemented. -
Per-record incremental refresh (tracking
_sourceRIDto update individual view rows) is a planned future optimization. Currently, all refresh operations perform a full truncate-and-reload. -
No support for cross-database queries in the defining query.
-
Server replication: materialized view data lives in the local backing type and is replicated like any other document type in an HA cluster, but refresh triggering is local to the node that executes the write.
Example: Sales Dashboard
-- Source type
CREATE DOCUMENT TYPE Sale;
-- A periodic summary refreshed every minute
CREATE MATERIALIZED VIEW SalesByProduct
AS SELECT product, sum(amount) AS total, count(*) AS count
FROM Sale
GROUP BY product
REFRESH EVERY 1 MINUTE;
-- An incremental view of recent activity (simple query)
CREATE MATERIALIZED VIEW RecentSales
AS SELECT product, amount, date
FROM Sale
WHERE date >= '2026-01-01'
REFRESH INCREMENTAL;
-- Query the views
SELECT * FROM SalesByProduct ORDER BY total DESC;
SELECT product, amount FROM RecentSales WHERE amount > 1000;
-- Manual refresh after a bulk import
REFRESH MATERIALIZED VIEW SalesByProduct;
-- Teardown
DROP MATERIALIZED VIEW SalesByProduct;
DROP MATERIALIZED VIEW RecentSales;
For more information, see:
5.2. Graph OLAP Engine
| Available since ArcadeDB v26.4.1. |
The Graph OLAP Engine maintains a read-optimized, columnar representation of your graph alongside the live OLTP data. It uses Compressed Sparse Row (CSR) encoding and flat primitive arrays to deliver 5x–400x speedups on analytical workloads — multi-hop traversals, graph algorithms, and property aggregations — without sacrificing transactional safety.
Why Graph OLAP?
ArcadeDB’s OLTP engine is optimized for point lookups and ACID transactions. Analytical workloads — PageRank, community detection, multi-hop traversals — access millions of edges in tight loops. The row-oriented, pointer-chasing nature of OLTP storage causes cache misses, object overhead, and GC pressure.
The OLAP engine solves this by encoding graph topology as flat int[] arrays and properties as typed columns:
-
Sequential memory access — cache-line friendly, no pointer chasing
-
Zero object allocation — no GC pressure during traversal
-
SIMD-friendly — enables JVM vectorized operations
-
9x more compact — flat arrays vs. Java object overhead
Graph Analytical View (GAV)
A Graph Analytical View is a named, schema-persisted OLAP snapshot of selected vertex types, edge types, and properties.
GraphAnalyticalView gav = GraphAnalyticalView.builder(database)
.withName("social")
.withVertexTypes("Person", "Company")
.withEdgeTypes("FOLLOWS", "WORKS_AT")
.withProperties("name", "age", "status")
.withUpdateMode(UpdateMode.SYNCHRONOUS)
.build();
Named views are persisted in schema.json and automatically restored on database restart.
SQL
Creating a view
CREATE GRAPH ANALYTICAL VIEW social
VERTEX TYPES (Person, Company)
EDGE TYPES (FOLLOWS, WORKS_AT)
PROPERTIES (name, age, status)
UPDATE MODE SYNCHRONOUS
All clauses after the view name are optional. A minimal view covering the entire graph:
CREATE GRAPH ANALYTICAL VIEW fullGraph
Use IF NOT EXISTS to avoid errors if the view already exists:
CREATE GRAPH ANALYTICAL VIEW social IF NOT EXISTS
VERTEX TYPES (Person)
EDGE TYPES (FOLLOWS)
UPDATE MODE SYNCHRONOUS
You can also materialize edge properties (e.g., weights):
CREATE GRAPH ANALYTICAL VIEW weighted
VERTEX TYPES (City)
EDGE TYPES (ROAD)
EDGE PROPERTIES (distance, toll)
UPDATE MODE SYNCHRONOUS
COMPACTION THRESHOLD 50000
Altering a view
Change the update mode or compaction threshold of an existing view:
ALTER GRAPH ANALYTICAL VIEW social UPDATE MODE ASYNCHRONOUS
ALTER GRAPH ANALYTICAL VIEW social COMPACTION THRESHOLD 20000
Rebuilding a view
Force a full rebuild of the CSR snapshot:
REBUILD GRAPH ANALYTICAL VIEW social
Dropping a view
DROP GRAPH ANALYTICAL VIEW social
DROP GRAPH ANALYTICAL VIEW IF EXISTS social
Listing views
SELECT FROM schema:graphAnalyticalViews
Builder Options
| Method | Description | Default |
|---|---|---|
|
Named registration + schema persistence |
anonymous |
|
Filter to specific vertex types |
all |
|
Filter to specific edge types |
all |
|
Materialize specific vertex properties |
all |
|
Materialize edge properties (e.g., weights) |
none |
|
OFF, SYNCHRONOUS, or ASYNCHRONOUS |
OFF |
|
Rebuild CSR after N accumulated delta edges |
10,000 |
Async Build for Large Graphs
For large graphs, use buildAsync() to avoid blocking the calling thread:
GraphAnalyticalView gav = GraphAnalyticalView.builder(database)
.withName("large-graph")
.withUpdateMode(UpdateMode.ASYNCHRONOUS)
.buildAsync();
// Wait for build completion
boolean ready = gav.awaitReady(30, TimeUnit.SECONDS);
Update Modes
The GAV supports three synchronization modes between OLTP and OLAP:
| Mode | Behavior | Staleness | Use Case |
|---|---|---|---|
OFF |
Marks view STALE on commit; requires manual rebuild |
Until rebuild |
Batch analytics, static snapshots |
SYNCHRONOUS |
Applies an overlay on each commit |
Zero |
Real-time analytics, consistent reads |
ASYNCHRONOUS |
Triggers background rebuild on commit |
Brief BUILDING window |
Large graphs, tolerable brief inconsistency |
In SYNCHRONOUS mode, the engine captures transaction deltas (new/deleted vertices, added/removed edges, property changes) and merges them into an immutable overlay on top of the base CSR. Readers always see a consistent snapshot via an atomic volatile reference swap. When the overlay accumulates too many changes (configurable threshold, default 10,000 edges), a background compaction rebuilds the full CSR.
How CSR Works
The graph topology is stored as two pairs of arrays (forward for outgoing edges, backward for incoming):
Forward CSR (outgoing edges): offsets: [0, 3, 5, 8, ...] -- one entry per vertex + sentinel neighbors: [1, 5, 7, 2, 6, ...] -- dense neighbor IDs, contiguous per source Outgoing neighbors of vertex v = neighbors[offsets[v] .. offsets[v+1]) Out-degree of vertex v = offsets[v+1] - offsets[v] -- O(1)
This layout enables sequential memory access (cache-line friendly) and O(1) degree lookups.
Columnar Property Storage
Properties are stored as typed flat arrays — int[], long[], double[], or dictionary-encoded int[] for strings.
Each column has a compact null bitmap (1 bit per vertex).
Dictionary encoding maps unique string values to integer codes, achieving near-100% compression for low-cardinality fields.
Memory Usage
The OLAP representation is significantly more compact than the OLTP equivalent:
-
CSR topology: ~8 bytes per edge (bidirectional)
-
Node ID mapping: ~8 bytes per vertex
-
Columnar properties: 4–8 bytes per vertex per column
-
Null bitmaps: 1 bit per vertex per column
Example: for a graph with 500K vertices and 8M edges, the GAV uses 134.6 MB compared to an estimated ~1.2 GB for the OLTP representation — 9.3x more compact.
long bytes = gav.getMemoryUsageBytes();
Graph Algorithms
The module includes parallelized graph algorithms that operate directly on CSR arrays with zero GC pressure:
| Algorithm | Description |
|---|---|
PageRank |
Pull-based, parallel, configurable damping factor and iterations |
Connected Components |
Min-label propagation for weakly connected components |
BFS |
Breadth-first search with distance arrays |
SSSP (Dijkstra) |
Single-source shortest path for weighted graphs |
Label Propagation |
Community detection |
Triangle Counting |
Count 3-cliques in the graph |
Local Clustering Coefficient |
Per-vertex clustering coefficients |
GraphAlgorithms algos = new GraphAlgorithms();
// PageRank (20 iterations, damping 0.85)
double[] ranks = algos.pageRank(gav, 20, 0.85);
// Connected Components
int[] components = algos.connectedComponents(gav);
// BFS from a source vertex
int[] distances = algos.bfs(gav, sourceNodeId);
Query Planner Integration
The Cypher query planner automatically detects ready GAVs and substitutes OLTP traversal operators with CSR-based operators when:
-
A named GAV is registered and in READY state
-
The GAV covers the required vertex and edge types
-
The query does not return edge variables as first-class records (edges in CSR have no RID; edge properties are fully supported)
No query changes are needed — the optimizer transparently accelerates matching traversal patterns.
Lifecycle
// Check status
if (gav.isReady()) { /* safe to query */ }
// Status values: NOT_BUILT, BUILDING, READY, STALE
Status status = gav.getStatus();
// Drop (removes from registry + schema)
gav.drop();
// Shutdown (release resources, schema definition persists)
gav.shutdown();
Benchmark Results
On a graph with 500K vertices and ~8M edges:
| Benchmark | OLTP | OLAP | Speedup |
|---|---|---|---|
1-hop count |
6.9 µs |
1.2 µs |
5.7x |
2-hop |
101.4 µs |
5.1 µs |
19.8x |
3-hop |
1,037 µs |
56.4 µs |
18.4x |
5-hop |
194,046 µs |
5,141 µs |
37.7x |
Shortest Path |
394 ms/pair |
7.5 ms/pair |
52.8x |
PageRank (20 iter) |
124,563 ms |
316 ms |
394.2x |
Connected Components |
5,591 ms |
197 ms |
28.4x |
Label Propagation |
62,619 ms |
645 ms |
97.1x |
Limitations
-
CSR uses
int[]arrays — maximum ~2.1 billion vertices per bucket and ~2.1 billion edges per direction -
Edges in CSR do not carry their own RID; the Cypher query planner falls back to OLTP only when the query returns an edge variable as a first-class record (e.g.,
RETURN r). Edge properties are fully supported viawithEdgeProperties() -
Dictionary encoding applies only to string properties
-
Initial build requires a full scan of selected vertex/edge types
5.2.1. Vector Embeddings
This guide covers practical decisions for working with vector embeddings in ArcadeDB: choosing dimensions, creating indexes, tuning parameters, and combining vector search with other query types.
Choosing an Embedding Model
Your embedding model determines the dimensions parameter for the index:
| Model | Dimensions | Notes |
|---|---|---|
OpenAI |
1536 |
General purpose, high quality |
OpenAI |
3072 |
Highest quality, largest memory footprint |
Sentence Transformers |
384 |
Fast, open source, good quality |
Sentence Transformers |
768 |
Better quality, slower |
Cohere |
1024 |
Good balance of quality and size |
CLIP (image + text) |
512 |
Multi-modal image/text |
| Start with 384 dimensions (MiniLM) for prototyping. Move to 768+ for production quality. Use quantization to manage memory at higher dimensions. |
Creating a Vector Index
Recommended index creation with INT8 quantization:
CREATE VERTEX TYPE Document
CREATE PROPERTY Document.content STRING
CREATE PROPERTY Document.embedding LIST OF FLOAT
CREATE INDEX ON Document (embedding) LSM_VECTOR METADATA {
dimensions: 384,
similarity: 'COSINE',
quantization: 'INT8'
}
INT8 quantization is recommended for all production workloads. It provides 2.5x faster search and 4x lower memory usage with negligible accuracy loss (see Why INT8 is faster). Only omit quantization for very small datasets (< 10K vectors) where maximum precision matters.
Production-ready index with additional tuning:
CREATE INDEX ON Document (embedding) LSM_VECTOR METADATA {
dimensions: 384,
similarity: 'COSINE',
quantization: 'INT8',
maxConnections: 16,
beamWidth: 100
}
Choosing a Similarity Function
| Function | Choose When | Avoid When |
|---|---|---|
COSINE |
Using text embedding models (most common). Vectors may have varying magnitudes. |
Vectors represent absolute quantities (distances, counts). |
DOT_PRODUCT |
Vectors are already L2-normalized. You need maximum query speed. |
Vectors are not normalized (results will be incorrect). |
EUCLIDEAN |
Working with spatial data, sensor readings, or continuous measurements. |
Comparing text embeddings of different lengths. |
Quantization Trade-offs
Use INT8 quantization for most use cases. It provides 4x memory savings with minimal accuracy loss and significantly faster ingestion and search:
-
< 10K vectors:
NONEis fine, butINT8works well too -
10K - 1M vectors: Use
INT8(4x memory savings, < 2% accuracy loss) — recommended -
> 1M vectors: Use
INT8for general use, orPRODUCTfor zero-disk-I/O graph construction on very large datasets -
Extreme compression: Use
BINARYfor first-pass filtering, then rerank with full vectors
-- INT8: recommended for most workloads
CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {
dimensions: 768,
similarity: 'COSINE',
quantization: 'INT8'
}
-- PRODUCT: for very large datasets, enables in-memory graph build
CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {
dimensions: 1024,
similarity: 'COSINE',
quantization: 'PRODUCT'
}
Tuning for Recall vs Speed
Adjust maxConnections and beamWidth based on your priorities:
| Profile | maxConnections | beamWidth | Trade-off |
|---|---|---|---|
Default |
16 |
100 |
Balanced for most workloads |
High recall |
32 |
200 |
Better accuracy, 2-3x slower builds, 50% more memory |
Fast indexing |
12 |
80 |
2x faster builds, 5-10% lower recall |
Memory constrained |
8 |
60 |
Minimal memory footprint |
For datasets over 100K vectors or with 1024+ dimensions, enable hierarchical mode:
CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {
dimensions: 1536,
similarity: 'COSINE',
quantization: 'INT8',
addHierarchy: true,
maxConnections: 32,
beamWidth: 200
}
Tuning efSearch
The efSearch parameter controls how many candidates the search explores at query time. By default, ArcadeDB uses an adaptive strategy that works well for most workloads. You only need to tune efSearch if you have specific recall or latency requirements.
| Profile | efSearch | Trade-off |
|---|---|---|
Adaptive (default) |
auto |
Two-pass: fast first pass ( |
High recall |
200-500 |
Consistent high accuracy, higher latency |
Low latency |
20-50 |
Fast responses, lower recall on hard queries |
You can override efSearch per-query without changing the index:
-- High recall for a critical search
SELECT expand(vectorNeighbors('Doc[embedding]', $queryVector, 10, 500))
-- Low latency for autocomplete/typeahead
SELECT expand(vectorNeighbors('Doc[embedding]', $queryVector, 5, 30))
Or set a default on the index:
CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {
dimensions: 768,
similarity: 'COSINE',
quantization: 'INT8',
efSearch: 200
}
Multi-Modal Embeddings
Store multiple embeddings per record for different search modalities:
CREATE VERTEX TYPE Product
CREATE PROPERTY Product.imageEmbedding ARRAY_OF_FLOATS
CREATE PROPERTY Product.textEmbedding ARRAY_OF_FLOATS
CREATE INDEX ON Product (imageEmbedding) LSM_VECTOR METADATA {dimensions: 512, similarity: 'COSINE'}
CREATE INDEX ON Product (textEmbedding) LSM_VECTOR METADATA {dimensions: 768, similarity: 'COSINE'}
Query each index independently:
-- Search by image similarity
SELECT name, distance FROM (
SELECT expand(vectorNeighbors('Product[imageEmbedding]', $imageVector, 10))
)
-- Search by text similarity
SELECT name, distance FROM (
SELECT expand(vectorNeighbors('Product[textEmbedding]', $textVector, 10))
)
Hybrid Search: Vector + Full-Text
Combine vector similarity with keyword matching for best results:
-- Step 1: Full-text search for keyword matches
SELECT @rid, title, content FROM Document
WHERE SEARCH_INDEX('Document[content]', 'machine learning')
-- Step 2: Vector search for semantic matches
SELECT @rid, title, distance FROM (
SELECT expand(vectorNeighbors('Document[embedding]', $queryVector, 20))
)
-- Combine scores using reciprocal rank fusion
SELECT vectorRRFScore(keywordRank, vectorRank, 60) AS score
Batch Ingestion
For bulk loading vectors, batch your inserts within transactions:
BEGIN
CREATE VERTEX Document SET content = 'First document', embedding = [0.1, 0.2, ...]
CREATE VERTEX Document SET content = 'Second document', embedding = [0.3, 0.4, ...]
-- ... more inserts ...
COMMIT
For large bulk loads, increase mutationsBeforeRebuild to delay index rebuilds until after the load completes, then trigger a rebuild.
|
When vectors are inserted below the rebuild threshold, an inactivity timer ensures the graph is still rebuilt after a period of no new mutations (default: 15 seconds). This prevents buffered vectors from remaining in the brute-force delta buffer indefinitely during low-volume ingestion. Configure via inactivityRebuildTimeoutMs (per-index metadata or arcadedb.vectorIndex.inactivityRebuildTimeoutMs globally). Set to 0 to disable.
|
If you create the index before inserting data (e.g., during schema setup), set buildGraphNow: false to skip the initial (empty) graph build. The graph will be built lazily on the first search:
-- Schema setup phase: defer graph build since no data exists yet
CREATE INDEX ON Document (embedding) LSM_VECTOR METADATA {
dimensions: 384,
similarity: 'COSINE',
quantization: 'INT8',
buildGraphNow: false
}
-- Bulk load data...
-- Graph is built automatically on first vectorNeighbors() query
If you create the index after data is already loaded, leave buildGraphNow at its default (true) so the index is immediately ready to query.
Global Configuration
Set database-wide defaults for vector index parameters:
ALTER DATABASE `arcadedb.vectorIndex.locationCacheSize` 100000
ALTER DATABASE `arcadedb.vectorIndex.graphBuildCacheSize` 10000
ALTER DATABASE `arcadedb.vectorIndex.mutationsBeforeRebuild` 100
ALTER DATABASE `arcadedb.vectorIndex.inactivityRebuildTimeoutMs` 15000
ALTER DATABASE `arcadedb.vectorIndex.storeVectorsInGraph` false
Per-index metadata overrides these global settings.
Further Reading
-
Vector Search Concepts — Architecture and algorithm details
-
Vector Search Tutorial — Step-by-step hands-on guide
-
Java Vector API — Programmatic index management
-
SQL Vector Functions — Complete function reference
5.3. Connectivity
-
JDBC Driver - Connect from Java via the Postgres JDBC driver
-
Python - HTTP, PostgreSQL, and embedded bindings
-
Postgres Protocol Plugin - PostgreSQL wire protocol support
-
Neo4j BOLT Protocol Plugin - Cypher queries via BOLT protocol
-
C#/.NET (HTTP/JSON) - Connect from .NET applications
-
Elixir (HTTP/JSON) - Connect from Elixir applications
-
Node.js / JavaScript - Connect from Node.js applications
-
Grafana - Native datasource plugin written in Go
5.3.1. JDBC Driver
If you’re using Java you can use the Postgres JDBC driver. This means the Postgres Plugin needs to be loaded by the server.
Class.forName("org.postgresql.Driver");
Properties props = new Properties();
props.setProperty("user", "user");
props.setProperty("password", "password");
props.setProperty("ssl", "false");
try (Connection conn = DriverManager.getConnection("jdbc:postgresql://localhost/mydb", props) ) {
try (Statement st = conn.createStatement()) {
st.executeQuery("create vertex type Hero");
st.executeQuery("create vertex Hero set name = 'Jay', lastName = 'Miner'");
PreparedStatement pst = conn.prepareStatement("create vertex Hero set name = ?, lastName = ?");
pst.setString(1, "Rocky");
pst.setString(2, "Balboa");
pst.execute();
pst.close();
try( ResultSet rs = st.executeQuery("SELECT * FROM Hero") ) { // Type and property names are case sensitive!
while (rs.next()) {
System.out.println("First Name: " + rs.getString(1) + " - Last Name: " + rs.getString(2));
}
}
}
}
5.3.2. Python
ArcadeDB can be accessed from Python in three ways:
| Method | Best For | Library |
|---|---|---|
PostgreSQL protocol |
Most applications |
|
HTTP/JSON API |
Lightweight scripts, serverless |
|
Embedded bindings |
High-performance, in-process |
|
PostgreSQL Protocol (Recommended)
ArcadeDB speaks the PostgreSQL wire protocol natively. Use any PostgreSQL driver — no special SDK needed.
pip install "psycopg[binary]>=3.1,<4"
import psycopg
conn = psycopg.connect(
host='localhost',
port=5432,
dbname='mydb',
user='root',
password='arcadedb',
autocommit=True,
)
with conn.cursor() as cur:
# SQL queries
cur.execute("SELECT FROM V LIMIT 10")
for row in cur.fetchall():
print(row)
# Cypher queries (prefix with {cypher})
cur.execute("{cypher} MATCH (n) RETURN n LIMIT 10")
for row in cur.fetchall():
print(row)
# Graph traversal with MATCH
cur.execute("""
SELECT person.name, friend.name
FROM MATCH {type: Person, as: person}
-Knows-> {type: Person, as: friend}
""")
for row in cur.fetchall():
print(f'{row[0]} knows {row[1]}')
conn.close()
Use autocommit=True for DDL operations (CREATE, ALTER, DROP). For transactional workloads, use the default mode and call conn.commit().
|
HTTP/JSON API
For lightweight scripts or serverless environments where you want minimal dependencies:
import requests
base_url = 'http://localhost:2480/api/v1'
auth = ('root', 'arcadedb')
# Run a SQL query
response = requests.post(
f'{base_url}/command/mydb',
json={'language': 'sql', 'command': 'SELECT FROM V LIMIT 10'},
auth=auth,
)
print(response.json())
# Run a Cypher query
response = requests.post(
f'{base_url}/command/mydb',
json={'language': 'cypher', 'command': 'MATCH (n) RETURN n LIMIT 10'},
auth=auth,
)
print(response.json())
Embedded Python Bindings
For high-performance use cases that need in-process access (no network overhead), see the arcadedb-embedded-python project which provides direct JVM bindings via JPype.
Further Reading
-
Python Quickstart Tutorial — Step-by-step tutorial
-
IAM Use Case — Full Python implementation with 7 query patterns
5.3.3. Postgres Protocol Plugin
ArcadeDB Server supports a subset of the Postgres wire protocol, such as connection and queries.
If you’re using ArcadeDB as embedded, please add the dependency to the arcadedb-postgresw library.
If you’re using Maven include this dependency in your pom.xml file.
<dependency>
<groupId>com.arcadedb</groupId>
<artifactId>arcadedb-postgresw</artifactId>
<version>26.3.1</version>
</dependency>
To start the Postgres plugin, enlist it in the server.plugins settings.
To specify multiple plugins, use the comma , as separator.
Example:
~/arcadedb $ bin/server.sh -Darcadedb.server.plugins="Postgres:com.arcadedb.postgres.PostgresProtocolPlugin"
If you’re using MS Windows OS, replace server.sh with server.bat.
In case of an incompatibility, restart the server with the additional option -Darcadedb.postgres.debug=true, repeat the connection attempt, and add the debug output to the issue report.
|
In case you’re running ArcadeDB with Docker, use -e to pass settings and open the Postgres default port 5432:
docker run --rm -p 2480:2480 -p 2424:2424 -p 5432:5432 \
--env JAVA_OPTS="-Darcadedb.server.rootPassword=playwithdata \
-Darcadedb.server.plugins=Postgres:com.arcadedb.postgres.PostgresProtocolPlugin " \
arcadedata/arcadedb:latest
The Server output will contain this line:
2021-07-08 19:05:06.081 INFO [ArcadeDBServer] <ArcadeDB_0> - Postgres Protocol plugin started
Once you have enabled the Postgres Protocol, you can interact with ArcadeDB server by using any Postgres drivers. The driver sends the queries to the ArcadeDB server without parsing or checking the syntax. For this reason, even if ArcadeDB SQL is different from Postgres SQL, you’re still able to execute any ArcadeDB SQL command through the Postgres driver. Check out the following list with the official drivers for the most popular programming languages:
For the complete list, please check Postgres website.
Other query languages
By default the Postgres driver interprets all the commands as SQL. To use another supported language, like Cypher, Gremlin, GraphQL or MongoDB, prefix the command with the language to use between curly brackets.
Example to execute a query by using GraphQL:
{graphql}{ bookById(id: "book-1"){ id name authors { firstName, lastName } }
Example to use Cypher:
{cypher}MATCH (m:Movie)<-[a:ACTED_IN]-(p:Person) WHERE id(m) = '#1:0' RETURN *
Example of using Gremlin:
{gremlin}g.V()
Current limitations
The documentation about Postgres wire protocol is not exhaustive to build a bullet proof protocol. In particular the state machine. For this reason this plugin was created by reading the available documentation online (official and not official) and looking into Postgres drivers or implementations.
| Particularly, ArcadeDB does only support "simple" query mode and does not support SSL! |
Transactions
Enabling auto commit to false is not 100% supported. With JDBC, leave the default settings or set:
conn.setAutoCommit(true);
Postgres Tools Known to Work
| Some tools compatible with Postgres may execute queries on internal Postgres tables to retrieve the schema. Those tables are not present in ArcadeDB, so it may return errors at startup. See tested compatible tools below. If the tool that you use to work with Postgres is not compatible with ArcadeDB, please open an issue. |
PostgreSQL Client psql
Postgres’s psql tool works out of the box, just like with an actual Postgres server.
To install this Postgres client, see here.
Connect from a terminal or console like this:
psql -h localhost -p 5432 -d mydatabase -U root
After authenticating, you can run SQL queries as normal. One can also submit the password via the environment:
PGPASSWORD=password psql -h localhost -p 5432 -d mydatabase -U root
or use the postgres protocol address:
psql postgres://username:password@host:port/database
In case the password contains special characters (like /, \, @, ?, !, &),
it needs to be URL encoded (also known as "percent encoding").
Note, that in the psql console queries or commands need to be terminated with a semi-colon ; to be submitted.
5.3.4. Neo4j BOLT Protocol Plugin
|
BOLT protocol support is available starting from ArcadeDB version 26.2.1. |
ArcadeDB Server supports the Neo4j BOLT protocol, enabling connectivity from any BOLT-compatible client or driver. This allows you to use the official Neo4j drivers with ArcadeDB, leveraging the native OpenCypher query engine for graph operations.
The BOLT protocol implementation supports:
-
BOLT v3.0, v4.0, and v4.4 protocol versions
-
Full Cypher query support via ArcadeDB’s native OpenCypher implementation
-
Parameterized queries for security and performance
-
Explicit transactions with BEGIN/COMMIT/ROLLBACK
-
Multi-database support with database selection per connection or query
-
Multi-label vertices following Neo4j’s node labeling conventions
Setup
If you’re using ArcadeDB as embedded, add the dependency to the arcadedb-bolt library.
If you’re using Maven, include this dependency in your pom.xml file:
<dependency>
<groupId>com.arcadedb</groupId>
<artifactId>arcadedb-bolt</artifactId>
<version>26.3.1</version>
</dependency>
To start the BOLT plugin, enlist it in the server.plugins settings.
To specify multiple plugins, use the comma , as separator.
Example:
~/arcadedb $ bin/server.sh -Darcadedb.server.plugins="Bolt:com.arcadedb.bolt.BoltProtocolPlugin"
If you’re using MS Windows OS, replace server.sh with server.bat.
In case you’re running ArcadeDB with Docker, use -e to pass settings and open the BOLT default port 7687:
docker run --rm -p 2480:2480 -p 2424:2424 -p 7687:7687 \
--env JAVA_OPTS="-Darcadedb.server.rootPassword=playwithdata \
-Darcadedb.server.plugins=Bolt:com.arcadedb.bolt.BoltProtocolPlugin " \
arcadedata/arcadedb:latest
The Server output will contain this line:
INFO [ArcadeDBServer] - Bolt Protocol plugin started (host=0.0.0.0 port=7687)
Configuration
The BOLT plugin supports the following configuration options:
| Setting | Default | Description |
|---|---|---|
|
7687 |
TCP/IP port for BOLT connections |
|
0.0.0.0 |
Host/IP address to bind to |
|
(none) |
Default database when not specified by client |
|
0 |
Maximum concurrent connections (0 = unlimited) |
|
300 |
Time-to-live in seconds for routing table entries |
|
false |
Enable BOLT protocol debug logging |
Example configuration:
bin/server.sh -Darcadedb.server.plugins="Bolt:com.arcadedb.bolt.BoltProtocolPlugin" \
-Darcadedb.bolt.port=7687 \
-Darcadedb.bolt.defaultDatabase=mydatabase
Compatible Drivers
The Neo4j official drivers are open source and licensed under Apache 2.0. You can use them with ArcadeDB’s BOLT protocol implementation:
| Language | Driver | Installation |
|---|---|---|
Java |
Maven: |
|
Python |
|
|
JavaScript |
|
|
.NET |
NuGet: |
|
Go |
|
For the complete list of community drivers, check the Neo4j Driver documentation.
Java Example
import org.neo4j.driver.*;
// Create driver (without encryption for local development)
Driver driver = GraphDatabase.driver(
"bolt://localhost:7687",
AuthTokens.basic("root", "playwithdata"),
Config.builder().withoutEncryption().build()
);
// Open session for specific database
try (Session session = driver.session(SessionConfig.forDatabase("mydatabase"))) {
// Execute a simple query
Result result = session.run("MATCH (n:Person) RETURN n.name AS name LIMIT 10");
while (result.hasNext()) {
Record record = result.next();
System.out.println(record.get("name").asString());
}
// Execute parameterized query
Result paramResult = session.run(
"MATCH (p:Person) WHERE p.age >= $minAge RETURN p.name, p.age",
Values.parameters("minAge", 25)
);
// Execute in explicit transaction
try (Transaction tx = session.beginTransaction()) {
tx.run("CREATE (n:Person {name: $name, age: $age})",
Values.parameters("name", "Alice", "age", 30));
tx.run("CREATE (n:Person {name: $name, age: $age})",
Values.parameters("name", "Bob", "age", 25));
tx.commit();
}
}
driver.close();
Python Example
from neo4j import GraphDatabase
# Create driver
driver = GraphDatabase.driver(
"bolt://localhost:7687",
auth=("root", "playwithdata")
)
# Query example
with driver.session(database="mydatabase") as session:
# Simple query
result = session.run("MATCH (n:Person) RETURN n.name AS name LIMIT 10")
for record in result:
print(record["name"])
# Parameterized query
result = session.run(
"MATCH (p:Person) WHERE p.age >= $minAge RETURN p.name, p.age",
minAge=25
)
# Explicit transaction
with session.begin_transaction() as tx:
tx.run("CREATE (n:Person {name: $name, age: $age})", name="Alice", age=30)
tx.run("CREATE (n:Person {name: $name, age: $age})", name="Bob", age=25)
tx.commit()
driver.close()
JavaScript Example
const neo4j = require('neo4j-driver');
// Create driver
const driver = neo4j.driver(
'bolt://localhost:7687',
neo4j.auth.basic('root', 'playwithdata')
);
// Query example
const session = driver.session({ database: 'mydatabase' });
try {
// Simple query
const result = await session.run('MATCH (n:Person) RETURN n.name AS name LIMIT 10');
result.records.forEach(record => {
console.log(record.get('name'));
});
// Parameterized query
const paramResult = await session.run(
'MATCH (p:Person) WHERE p.age >= $minAge RETURN p.name, p.age',
{ minAge: 25 }
);
// Explicit transaction
const tx = session.beginTransaction();
await tx.run('CREATE (n:Person {name: $name, age: $age})', { name: 'Alice', age: 30 });
await tx.run('CREATE (n:Person {name: $name, age: $age})', { name: 'Bob', age: 25 });
await tx.commit();
} finally {
await session.close();
await driver.close();
}
Cypher Query Examples
Since BOLT protocol uses Cypher as its query language, you can execute any query supported by ArcadeDB’s OpenCypher implementation:
// Create vertices
CREATE (alice:Person {name: 'Alice', age: 30})
CREATE (bob:Person {name: 'Bob', age: 25})
// Create relationship
MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
CREATE (a)-[:KNOWS {since: 2020}]->(b)
// Query with pattern matching
MATCH (p:Person)-[:KNOWS]->(friend)
WHERE p.age > 20
RETURN p.name, friend.name
// Variable-length paths
MATCH path = (start:Person)-[:KNOWS*1..3]->(end:Person)
RETURN path
// Aggregations
MATCH (p:Person)
RETURN avg(p.age) AS averageAge, count(p) AS totalPeople
Transactions
BOLT protocol supports explicit transactions:
-
Auto-commit mode: Single queries outside a transaction are automatically committed
-
Explicit transactions: Use BEGIN/COMMIT/ROLLBACK for multi-statement transactions
-
Rollback on error: Transactions are automatically rolled back if an error occurs
// Auto-commit (implicit transaction)
session.run("CREATE (n:Person {name: 'Charlie'})");
// Explicit transaction with multiple operations
try (Transaction tx = session.beginTransaction()) {
tx.run("CREATE (a:Person {name: 'David'})");
tx.run("CREATE (b:Person {name: 'Eve'})");
tx.run("MATCH (a:Person {name: 'David'}), (b:Person {name: 'Eve'}) CREATE (a)-[:FRIENDS]->(b)");
tx.commit();
}
// Transaction rollback example
try (Transaction tx = session.beginTransaction()) {
tx.run("CREATE (n:Person {name: 'Frank'})");
// Rollback - changes will not be persisted
tx.rollback();
}
Current Limitations
-
TLS/SSL: The current implementation does not support encrypted connections. Use network-level security (VPN, SSH tunnel) for production deployments requiring encryption.
-
Routing: Single-server routing only. In cluster deployments, the routing table returns the connected server for all roles (READ, WRITE, ROUTE).
Troubleshooting
If you encounter connection issues:
-
Enable debug logging: Start the server with
-Darcadedb.bolt.debug=trueto see detailed protocol messages. -
Check port availability: Ensure port 7687 (or your configured port) is not in use by another service.
-
Verify authentication: ArcadeDB requires authentication. Ensure you’re providing valid credentials.
-
Disable encryption in drivers: If using Neo4j drivers, configure them to connect without encryption since TLS is not yet supported.
For Java driver:
Config.builder().withoutEncryption().build()
For Python driver:
driver = GraphDatabase.driver(uri, auth=auth, encrypted=False)
5.3.5. C#/.NET (HTTP/JSON)
Sample Code
In C#/.NET 7.0/8.0, HTTP/JSON requests can be made using the HTTPClient class inside an async function.
In real situations, the HTTPClient object should not be created or discarded on each request. HTTPClient should be typically created once in the lifespan of an application, stored in a Singleton Instance or static reference/class, and reused for each request. Here it is created in the function just for simplicity.
The following example demonstrates a simple function which will add a record of the type Profile to a database named mydb with the name Alexander.
public async void addProfileName(){
HttpClient httpClient = new(); //typically instantiate this only once over course of application, then reuse
HttpRequestMessage msg = new();
msg.Method = HttpMethod.Post;
string authString = "root:arcadedb-password"; //add your password here
string base64AuthString = Convert.ToBase64String(System.Text.ASCIIEncoding.ASCII.GetBytes(authString));
msg.Headers.Authorization = new("Basic", base64AuthString);
msg.RequestUri = new Uri("http://serveraddress:2480/api/v1/command/mydb"); //add your server address (or localhost) and db name
HttpContent httpContent = new StringContent("{ \"language\": \"sql\", \"command\": \"INSERT into Profile set name = \'Alexander\'\" }", Encoding.UTF8, "application/json"); //customize command here
msg.Content = httpContent;
HttpResponseMessage response = await httpClient.SendAsync(msg);
string responseString = await response.Content.ReadAsStringAsync();
Debug.WriteLine("SENT REQUEST, RESPONSE: " + responseString);
}
5.3.6. Elixir (HTTP/JSON)
Configuration
Ensure the Phoenix Elixir framework is installed as per official instructions.
A test project named testproject can then be created in a given folder by running mix phx.new testproject. Various options during project creation are available.
A package such as HTTPoison must next be added to perform HTTP Requests.
Open the newly created mix.exs file and add the line {:httpoison, "~> 2.2"} (check current version of the package as indicated on the HTTPoison Package site):
defp deps do
[
{:phoenix, "~> 1.7.10"},
# ... other packages
{:httpoison, "~> 2.2"}
]
end
Save and close mix.exs. Run mix deps.get to update the project and download HTTPoison into this project.
To start an interactive prompt enter cd testserver and then iex -S mix phx.server. This will begin an Interactive Elixir command prompt as indicated by iex()>.
Sample Code
A simple HTTP request can then be performed by running the following commands sequentially:
userPass = "root:arcadedb-password"
base64UserPass = Base.encode64(userPass)
authString = "Basic " <> base64UserPass
url = "http://serveraddress:2480/api/v1/command/mydb"
body = Jason.encode!(%{language: "sql", command: "SELECT from Profile"})
headers = [{"Authorization", authString}, {"Content-Type", "application/json"}]
HTTPoison.post(url, body, headers)
To process returned data, one can use the following approach:
case HTTPoison.post(url, body, headers) do
{:ok, %{status_code: 200, body: body}} ->
# do something with the body
Jason.decode!(body)
{:ok, %{status_code: 404}} ->
# do something with a 404
IO.puts("error404")
{:error, %{reason: reason}} ->
# do something with an error
IO.puts(reason)
end
5.3.7. Node.js / JavaScript
ArcadeDB can be accessed from Node.js in two ways:
| Method | Best For | Library |
|---|---|---|
PostgreSQL protocol |
Most applications |
|
HTTP/JSON API |
Lightweight scripts, serverless, browsers |
|
PostgreSQL Protocol (Recommended)
ArcadeDB speaks the PostgreSQL wire protocol natively. Use the standard pg client.
npm install pg
const { Client } = require('pg');
const client = new Client({
host: 'localhost',
port: 5432,
database: 'mydb',
user: 'root',
password: 'arcadedb',
});
await client.connect();
// SQL queries
const result = await client.query('SELECT FROM V LIMIT 10');
console.log(result.rows);
// Cypher queries (prefix with {cypher})
const cypher = await client.query('{cypher} MATCH (n) RETURN n LIMIT 10');
console.log(cypher.rows);
// Graph traversal with MATCH
const graph = await client.query(`
SELECT person.name, friend.name
FROM MATCH {type: Person, as: person}
-Knows-> {type: Person, as: friend}
`);
for (const row of graph.rows) {
console.log(`${row['person.name']} knows ${row['friend.name']}`);
}
await client.end();
HTTP/JSON API
For lightweight scripts, serverless functions, or browser-side code:
Using fetch (Node.js 18+ or browsers)
const base = 'http://localhost:2480/api/v1';
const auth = 'Basic ' + btoa('root:arcadedb');
const response = await fetch(`${base}/command/mydb`, {
method: 'POST',
headers: {
'Authorization': auth,
'Content-Type': 'application/json',
},
body: JSON.stringify({
language: 'sql',
command: 'SELECT FROM V LIMIT 10',
}),
});
const data = await response.json();
console.log(data);
Using axios
npm install axios
const axios = require('axios');
const auth = 'Basic ' + btoa('root:arcadedb');
const res = await axios.post(
'http://localhost:2480/api/v1/command/mydb',
{ language: 'sql', command: 'SELECT FROM V LIMIT 10' },
{ headers: { 'Authorization': auth, 'Content-Type': 'application/json' } },
);
console.log(res.data);
TypeScript
Both pg and fetch work with TypeScript. Install type definitions for pg:
npm install pg @types/pg
import { Client } from 'pg';
const client = new Client({
host: 'localhost',
port: 5432,
database: 'mydb',
user: 'root',
password: 'arcadedb',
});
await client.connect();
const result = await client.query('SELECT FROM V LIMIT 10');
console.log(result.rows);
await client.end();
Further Reading
-
JavaScript Quickstart Tutorial — Step-by-step tutorial
-
Supply Chain Use Case — Full JavaScript implementation with 7 query patterns
5.3.8. Grafana Datasource Plugin
ArcadeDB provides a native Grafana datasource plugin written in Go. The plugin connects directly to ArcadeDB’s HTTP/JSON API, allowing you to query your database and build dashboards in Grafana without any intermediate layer.
Installation
Install the plugin from the GitHub repository:
grafana cli plugins install arcadedb-datasource
Or build from source:
git clone https://github.com/ArcadeData/arcadedb-grafana-datasource.git
cd arcadedb-grafana-datasource
mage -v build:linux
Configuration
-
In Grafana, go to Configuration > Data Sources > Add data source.
-
Select ArcadeDB.
-
Configure the connection:
-
URL: The ArcadeDB server URL (e.g.,
http://localhost:2480) -
Database: The database name
-
Authentication: Username and password for ArcadeDB
-
Usage
Once configured, you can use any of ArcadeDB’s supported query languages (SQL, Cypher, Gremlin, GraphQL, MongoDB QL) directly in Grafana panels to build dashboards and visualizations.
For more information about Grafana integration including monitoring and observability, see Grafana Integration.
5.4. BI and Analytics Integration
ArcadeDB integrates with popular BI and analytics tools through its PostgreSQL wire protocol and dedicated connectors. Since ArcadeDB speaks the PostgreSQL protocol on port 5432, virtually any tool that supports PostgreSQL can connect without a custom driver.
All tools listed below connect via ArcadeDB’s PostgreSQL wire protocol unless stated otherwise. See Postgres Protocol Plugin for setup instructions.
5.4.1. Grafana
Grafana is the leading open-source platform for monitoring and observability. ArcadeDB provides two integration paths: a dedicated Grafana plugin (recommended) and a fallback via the built-in PostgreSQL data source.
ArcadeDB Grafana Plugin (Recommended)
The ArcadeDB Grafana plugin provides a native integration with dedicated query editors, time series visual builder, and graph visualization via Grafana’s Node Graph panel.
Features:
-
Time Series mode - visual query builder with auto-discovered types, fields, tags, and aggregation functions (SUM, AVG, MIN, MAX, COUNT)
-
SQL mode - ArcadeDB SQL with macro expansion (
$timeFrom,$timeTo,$timeFilter(col),$interval) -
Cypher mode - OpenCypher queries with optional Node Graph visualization
-
Gremlin mode - Apache Gremlin traversals with optional Node Graph visualization
-
Alerting - full Grafana alerting support via the Go backend
-
Template variables - dashboard variables populated from ArcadeDB queries
Installation:
grafana-cli plugins install arcadedb-arcadedb-datasource
For self-hosted Grafana with unsigned plugins, add to grafana.ini:
[plugins]
allow_loading_unsigned_plugins = arcadedb-arcadedb-datasource
Configuration:
-
Go to Connections > Data Sources > Add data source and search for ArcadeDB.
-
Set the URL to your ArcadeDB HTTP API (e.g.,
http://localhost:2480). -
Enable Basic Auth and enter your ArcadeDB credentials.
-
Set the Database name in the ArcadeDB section.
-
Click Save & Test.
Time Series query example:
-
Select Time Series mode.
-
Choose a type (e.g.,
cpu_metrics), fields (e.g.,usage), and optional tag filters. -
Add an aggregation: AVG on
usagewith a bucket interval or leave it to auto-calculate.
SQL query example:
SELECT ts, temperature FROM weather
WHERE $__timeFilter(ts)
ORDER BY ts
Cypher query with Node Graph:
Enable the Node Graph toggle, then:
MATCH (p:Person)-[r:FRIEND_OF]->(f:Person)
RETURN p, r, f LIMIT 100
The result renders as an interactive graph in Grafana’s Node Graph panel.
PostgreSQL Data Source (Fallback)
If you cannot install the ArcadeDB plugin, use Grafana’s built-in PostgreSQL data source with ArcadeDB’s PostgreSQL wire protocol.
-
Go to Connections > Data Sources > Add data source and select PostgreSQL.
-
Set Host to
localhost:5432, enter the Database name and credentials. -
Set TLS/SSL Mode to
disable. -
Click Save & Test.
In panels, switch to Code mode and write SQL:
SELECT name, age FROM Person LIMIT 100
Use the {cypher} prefix for Cypher queries:
{cypher}MATCH (p:Person) RETURN p.name, p.age LIMIT 100
| The PostgreSQL fallback does not support the time series visual builder, Node Graph visualization, ArcadeDB-specific macros, or alerting optimizations. Use the dedicated plugin for the full experience. |
Grafana Time Series Endpoints
ArcadeDB also provides native Grafana DataFrame-compatible HTTP endpoints for time series data.
See Time Series for full documentation of the /grafana/health, /grafana/metadata, and /grafana/query endpoints, as well as the PromQL-compatible endpoints.
5.4.2. Apache Superset
Apache Superset is the most popular open-source BI and data exploration platform. It connects to ArcadeDB via the PostgreSQL wire protocol using SQLAlchemy and psycopg2.
Setup
-
In Superset, go to Settings > Database Connections > + Database.
-
Select PostgreSQL.
-
Enter the connection details:
Host
localhost(or your ArcadeDB server host)Port
5432Database
Your ArcadeDB database name
Username
Your ArcadeDB username
Password
Your ArcadeDB password
Or use the SQLAlchemy URI:
postgresql+psycopg2://root:arcadedb@localhost:5432/mydb -
In the Advanced tab, add the following to Extra to disable SSL:
{ "connect_args": { "sslmode": "disable" } } -
Click Test Connection, then Connect.
Usage
Use SQL Lab to run queries:
SELECT name, age, city FROM Person LIMIT 100
Graph traversals work via ArcadeDB SQL:
SELECT name, out('FriendOf').size() AS friends
FROM Person
WHERE out('FriendOf').size() > 3
Cypher queries work with the language prefix:
{cypher}MATCH (p:Person)-[:FRIEND_OF]->(f:Person)
RETURN p.name AS person, f.name AS friend LIMIT 50
Charts and dashboards can be created from SQL Lab results or by selecting an ArcadeDB type as a table in the chart builder.
Known Limitations
-
Schema introspection is partial - Superset may not discover all types or columns automatically
-
Language prefixes (
{cypher},{gremlin}) work in SQL Lab but not in the visual chart builder -
Some ArcadeDB SQL extensions (e.g.,
out(),in()) may confuse Superset’s query parser in the visual builder -
SSL is not supported - the
sslmode=disablesetting is required -
No graph visualization (Superset does not have a network chart type)
5.4.3. Metabase
Metabase is a popular open-source BI tool known for its ease of use. It connects to ArcadeDB via the built-in PostgreSQL JDBC driver.
Setup
-
In Metabase, go to Settings > Admin > Databases > Add database.
-
Select PostgreSQL.
-
Enter the connection details:
Display name
ArcadeDBHost
localhostPort
5432Database name
Your ArcadeDB database name
Username
Your ArcadeDB username
Password
Your ArcadeDB password
-
Expand Additional JDBC connection string options and add:
sslmode=disable -
Click Save. Metabase will sync the schema.
Usage
Click + New > SQL query, select your ArcadeDB database, and write queries:
SELECT name, age FROM Person ORDER BY name LIMIT 50
Graph traversal via ArcadeDB SQL:
SELECT name, out('FriendOf').size() AS friendCount
FROM Person
WHERE out('FriendOf').size() > 2
Cypher via language prefix in native query mode:
{cypher}MATCH (p:Person)-[:FRIEND_OF]->(f:Person)
RETURN p.name AS person, f.name AS friend LIMIT 50
Metabase’s "Simple question" and "Custom question" builders show ArcadeDB types as tables with their properties as columns.
Known Limitations
-
Schema sync may not discover all types or properties
-
The visual question builders work for basic queries but may have gaps for complex filter types
-
Language prefixes only work in native query mode, not the visual builders
-
Auto-generated queries from Metabase may occasionally use PostgreSQL-specific syntax that ArcadeDB does not support
-
SSL is not supported
5.4.4. Tableau
Tableau is a leading commercial BI platform. It connects to ArcadeDB via the built-in PostgreSQL connector.
Setup
-
In Tableau Desktop, go to Connect > To a Server > PostgreSQL.
-
Enter the connection details:
Server
localhostPort
5432Database
Your ArcadeDB database name
Username
Your ArcadeDB username
Password
Your ArcadeDB password
Require SSL
Unchecked
-
Click Sign In.
Usage
For the best experience with ArcadeDB, use Custom SQL rather than dragging tables:
-
In the Data Source tab, click New Custom SQL.
-
Enter your query:
SELECT name, age, city FROM PersonOr with graph traversal:
SELECT name, out('FriendOf').size() AS friendCount, city FROM Person -
Click OK. Tableau executes the query and shows the result schema.
-
Drag fields to rows/columns to build visualizations.
Cypher queries work via the language prefix in Custom SQL:
{cypher}MATCH (p:Person)-[:FRIEND_OF]->(f:Person)
RETURN p.name AS person, p.city AS city, COUNT(f) AS friendCount
| Use Import mode (extract) rather than Live Connection for large datasets or when ArcadeDB-specific SQL extensions cause issues with Tableau’s query rewriting. |
Known Limitations
-
Schema discovery may show incomplete table/column lists - prefer Custom SQL
-
Tableau may rewrite queries using PostgreSQL-specific syntax that ArcadeDB does not fully support
-
Relationships between tables must be defined manually via Custom SQL
-
No built-in graph/network visualization type
-
SSL is not supported - ensure "Require SSL" is unchecked
5.4.5. Power BI
Microsoft Power BI connects to ArcadeDB via its PostgreSQL connector.
Setup
-
In Power BI Desktop, click Get Data > More… > Database > PostgreSQL database.
-
Click Connect.
-
Enter the connection details:
Server
localhost:5432(include the port)Database
Your ArcadeDB database name
Data Connectivity mode
Import (recommended)
-
When prompted for credentials, select Database and enter your ArcadeDB username and password.
-
If prompted about encryption, choose to connect without encryption.
Usage
For custom queries, use the Advanced options in the connection dialog:
-
In the SQL statement field, enter your query:
SELECT name, age, city FROM Person LIMIT 1000 -
Click OK.
Graph traversals and Cypher via language prefix:
SELECT name, out('FriendOf').size() AS friendCount FROM Person WHERE age > 25
{cypher}MATCH (p:Person)-[:FRIEND_OF]->(f:Person)
RETURN p.name AS person, f.name AS friend, p.city AS city
After loading data, build visualizations by dragging fields from the Fields panel. For graph/network visualizations, install a custom visual from AppSource (e.g., "Network Graph by Powerviz").
| Use Import mode instead of DirectQuery. DirectQuery works for basic queries but may have issues with complex types or ArcadeDB-specific SQL extensions. |
Known Limitations
-
DirectQuery mode may not work reliably for all query types - Import mode is recommended
-
The Navigator dialog may show incomplete or no tables - use native SQL queries instead
-
Language prefixes work in the SQL statement field but not through the visual query builder
-
No built-in graph/network visualization - requires a marketplace custom visual
-
SSL is not supported - connections must be unencrypted
-
Refreshing data in Power BI Service requires a gateway configured for PostgreSQL connections
5.5. Operations
5.5.2. Installations from Binaries
ArcadeDB is released for both Java 17 and Java 21.
Java 21 packages are available through GitHub releases page or Maven central.
Java 17 packages are available through Github packages page.
Download the package suitable for your platform and follow the instructions below.
-
Unpack
tar -xzf arcadedb-26.3.1.tar.gz-
Change into directory:
cd arcadedb-26.3.1
-
-
Launch server
-
Linux / MacOS:
bin/server.sh -
Windows:
bin\server.bat
-
-
Exit server via CTRL+C
-
Interact with server
-
Console:
-
Linux / MacOS:
bin/console.sh -
Windows:
bin\console.bat
-
-
Exit server via CTRL+C
Binaries
| Linux / Mac | Windows | |
|---|---|---|
|
|
|
|
|
5.5.3. Mac OS X
Popular way to get opensource software is to use homebrew project.
Currently, ArcadeDB is not available through an official Homebrew formula. To install ArcadeDB on Mac OS X:
-
Download the latest release from https://github.com/ArcadeData/arcadedb/releases
-
Extract the archive to your preferred location (e.g.,
/usr/local/arcadedb) -
Add the
bindirectory to your PATH:
echo 'export PATH="/usr/local/arcadedb/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
5.5.4. Windows via Scoop
Instead of using manual install you can use Scoop installer, instructions are available on their project website.
scoop bucket add extras
scoop install arcadedb
This downloads and installs ArcadeDB on your box and makes following two commands available:
arcadedb-console
arcadedb-server
You should use these instead of bin\console.bat and bin\server.bat mentioned above.
5.5.5. Custom Package Builder
The ArcadeDB Modular Distribution Builder (arcadedb-builder.sh) allows you to create custom ArcadeDB packages containing only the modules you need. This results in smaller distributions, reduced dependencies, and simplified deployments.
Prerequisites
The builder requires the following tools:
-
curlorwget- for downloading files -
tar- for extracting and creating archives -
unzipandzip- for creating zip archives -
sha256sumorshasum- for checksum verification -
docker(optional) - for Docker image generation
Quick Start
Run directly with curl (one-liner):
curl -fsSL https://github.com/ArcadeData/arcadedb/releases/download/26.3.1/arcadedb-builder.sh | bash -s -- --version=26.3.1 --modules=gremlin,studio
Download and run interactively:
curl -fsSLO https://github.com/ArcadeData/arcadedb/releases/download/26.3.1/arcadedb-builder.sh
chmod +x arcadedb-builder.sh
./arcadedb-builder.sh
Preview without building (dry run):
./arcadedb-builder.sh --version=26.3.1 --modules=gremlin,studio --dry-run
Available Modules
Core Modules (always included)
-
engine- Database engine -
server- HTTP/REST API, clustering -
network- Network communication
Optional Modules
| Module | Description |
|---|---|
|
Interactive database console |
|
Web-based administration interface |
|
Apache Tinkerpop Gremlin support |
|
PostgreSQL wire protocol compatibility |
|
MongoDB wire protocol compatibility |
|
Redis wire protocol compatibility |
|
gRPC wire protocol support |
|
GraphQL API support |
|
Prometheus metrics integration |
Usage Examples
Minimal build (PostgreSQL protocol only):
./arcadedb-builder.sh --version={revnumber} --modules=postgresw
Development build:
./arcadedb-builder.sh \
--version=26.3.1 \
--modules=console,gremlin,studio \
--output-name=arcadedb-dev
Production build (no Studio):
./arcadedb-builder.sh \
--version=26.3.1 \
--modules=postgresw,metrics \
--output-name=arcadedb-prod
CI/CD build:
./arcadedb-builder.sh \
--version=26.3.1 \
--modules=gremlin,studio \
--quiet \
--skip-docker \
--output-dir=/tmp/builds
Command-Line Options
| Option | Description |
|---|---|
|
ArcadeDB version to build (required for non-interactive mode) |
|
Comma-separated list of modules |
|
Custom name for distribution |
|
Output directory (default: current directory) |
|
Use local Maven repository or custom JAR directory |
|
Use local base distribution file |
|
Build Docker image with specified tag |
|
Skip Docker image build |
|
Only generate Dockerfile, don’t build image |
|
Show what would be done without executing |
|
Enable verbose output |
|
Suppress non-error output |
|
Show help message |
Output Files
The builder creates:
-
{output-name}.zip- Zip archive -
{output-name}.tar.gz- Compressed tarball -
Docker image with tag
{docker-tag}(if not skipped)
For more detailed documentation, see the Builder README and Modular Builder Guide.
5.5.6. Installation with Docker
ArcadeDB can be run in a Docker container. Images for both Java 21 and Java 17 are available.
Java 21 ArcadeDB is available as a Docker image on DockerHub: DockerHub.
Java 17 ArcadeDB is available as a Docker image on GitHub Container Registry: https://github.com/orgs/ArcadeData/packages?ecosystem=container
To pull the latest Java 21 image from DockerHub:
docker pull arcadedb/arcadedb:{revnumber}
To pull the latest Java 17 image from GitHub Container Registry:
docker pull ghcr.io/arcadedata/arcadedb:{revnumber}-java17
5.5.7. Server
To start ArcadeDB as a server run the script server.sh under the bin directory of ArcadeDB distribution. If you’re using MS Windows OS, replace server.sh with server.bat.
$ bin/server.sh
█████╗ ██████╗ ██████╗ █████╗ ██████╗ ███████╗██████╗ ██████╗
██╔══██╗██╔══██╗██╔════╝██╔══██╗██╔══██╗██╔════╝██╔══██╗██╔══██╗
███████║██████╔╝██║ ███████║██║ ██║█████╗ ██║ ██║██████╔╝
██╔══██║██╔══██╗██║ ██╔══██║██║ ██║██╔══╝ ██║ ██║██╔══██╗
██║ ██║██║ ██║╚██████╗██║ ██║██████╔╝███████╗██████╔╝██████╔╝
╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝╚═╝ ╚═╝╚═════╝ ╚══════╝╚═════╝ ╚═════╝
PLAY WITH DATA arcadedb.com
2025-12-09 22:22:22.839 INFO [ArcadeDBServer] <ArcadeDB_0> ArcadeDB Server v26.3.1 is starting up...
2025-12-09 22:22:22.841 INFO [ArcadeDBServer] <ArcadeDB_0> Running on Mac OS X 15.6 - OpenJDK 64-Bit Server VM 25.0.1 (Homebrew)
2025-12-09 22:22:22.843 INFO [ArcadeDBServer] <ArcadeDB_0> Starting ArcadeDB Server in development mode with plugins [] ...
2025-12-09 22:22:22.878 INFO [ArcadeDBServer] <ArcadeDB_0> - Metrics Collection Started...
2025-12-09 22:22:22.891 INFO [ArcadeDBServer] <ArcadeDB_0> Server root path: .
2025-12-09 22:22:22.891 INFO [ArcadeDBServer] <ArcadeDB_0> Databases directory: ./databases
2025-12-09 22:22:22.891 INFO [ArcadeDBServer] <ArcadeDB_0> Backups directory: ./backups
+--------------------------------------------------------------------+
| WARNING: FIRST RUN CONFIGURATION |
+--------------------------------------------------------------------+
| This is the first time the server is running. Please type a |
| password of your choice for the 'root' user or leave it blank |
| to auto-generate it. |
| |
| To avoid this message set the environment variable or JVM |
| setting `arcadedb.server.rootPassword` to the root password to use.|
+--------------------------------------------------------------------+
Root password [BLANK=auto generate it]: *
The first time the server is running, the root password must be inserted and confirmed.
The hash (+salt) of the inserted password will be stored in the file config/server-users.json.
The password length must be between 8 and 256 characters.
To know more about this topic, look at Security.
Delete this file and restart the server to reinsert the password for server’s root user.
The default rules of security are pretty basic. You can implement your own security policy. Check the Security Policy.
You can skip the request for the password by passing it as a setting. Example:
-Darcadedb.server.rootPassword=this_is_a_password
Alternatively the password can be passed file-based. Example:
-Darcadedb.server.rootPasswordPath=/run/secrets/root
which is particularly useful for container-based deployments.
| The password file is a plain-text file and should not contain any line breaks / new lines. |
Once inserted the password for the root user, you should see this output.
Root password [BLANK=auto generate it]: *********
*Please type the root password for confirmation (copy and paste will not work): *********
2025-12-09 22:23:33.719 INFO [HttpServer] <ArcadeDB_0> - Starting HTTP Server (host=0.0.0.0 port=[2480, 2489] httpsPort=[2490, 2499])...
2025-12-09 22:23:33.738 INFO [undertow] starting server: Undertow - 2.3.20.Final
2025-12-09 22:23:33.741 INFO [xnio] XNIO version 3.8.16.Final
2025-12-09 22:23:33.744 INFO [nio] XNIO NIO Implementation Version 3.8.16.Final
2025-12-09 22:23:33.803 INFO [HttpServer] <ArcadeDB_0> - HTTP Server started (host=0.0.0.0 port=2480 httpsPort=2490)
2025-12-09 22:23:33.908 INFO [ArcadeDBServer] <ArcadeDB_0> Available query languages: [sqlscript, mongo, gremlin, java, cypher, js, graphql, sql]
2025-12-09 22:23:33.909 INFO [ArcadeDBServer] <ArcadeDB_0> ArcadeDB Server started in 'development' mode (CPUs=8 MAXRAM=4,00GB)
2025-12-09 22:23:33.910 INFO [ArcadeDBServer] <ArcadeDB_0> Studio web tool available at http://192.168.1.102:2480
A warning WARNING: Using incubator modules: jdk.incubator.vector in the log can be safely ignored.
|
By default, the following components start with the server:
-
JMX Metrics, to monitor server performance and statistics (served via port 9999).
-
HTTP Server, that listens on port 2480 by default. If port 2480 is already occupied, then the next is taken up to 2489.
In the output above, the name ArcadeDB_0 is the server name.
By default, ArcadeDB_0 is used.
To specify a different name define it with the setting server.name, example:
$ bin/server.sh -Darcadedb.server.name=ArcadeDB_Europe_0
In a high availability (HA) configuration, it’s mandatory that all the servers in an cluster have different names.
Start server hint
To start the server from a location different than the ArcadeDB folder,
for example, if starting the server as a service,
set the environment variable ARCADEDB_HOME to the ArcadeDB folder:
$ export ARCADEDB_HOME=/path/to/arcadedb
Server modes
The server can be started in one of three modes, which affect the studio, logging, and security defaults:
| Mode | Studio | Logging | WAL Flush | LOAD CSV file access |
|---|---|---|---|---|
|
Yes |
Detailed |
No flush ( |
Enabled |
|
Yes |
Brief |
No flush ( |
Enabled |
|
No |
Brief |
Auto-set to |
Auto-disabled if not configured |
The mode is controlled by the server.mode setting with a default mode development.
Production mode defaults
When the server starts in production mode, ArcadeDB automatically applies safe defaults for settings that have not been explicitly configured:
-
WAL flush (
arcadedb.txWalFlush): set to1(flush without metadata) for transaction durability. If you explicitly set it to0, a warning is logged. See WAL Flush and Durability for details. -
LOAD CSV file access (
arcadedb.opencypher.loadCsv.allowFileUrls): disabled to prevent Cypher queries from reading local files on the server. Set totrueexplicitly if needed.
These defaults are only applied when the setting has not been explicitly configured by the user (via system property, environment variable, or API call). Explicit settings are always respected.
At startup, a production checklist is logged summarizing the security-relevant configuration:
INFO Production checklist:
INFO - WAL flush: 1 (flush without metadata) [OK]
INFO - SSL: disabled. Consider enabling for encrypted connections
INFO - High availability: disabled. Consider enabling for fault tolerance
INFO - LOAD CSV file access: disabled [OK]
INFO - Backup support: enabled [OK]
Create default database(s)
Instead of starting a server and then connect to it, to create the default databases, ArcadeDB Server takes an initial default databases list by using the setting server.defaultDatabases.
$ bin/server.sh "-Darcadedb.server.defaultDatabases=Universe[albert:einstein]"
With the example above the database "Universe" will be created if doesn’t exist, with user "albert", password "einstein".
Due to the use of [], the command line argument needs to be wrapped in quotes.
|
A default database without users still needs to include empty brackets, ie: -Darcadedb.server.defaultDatabases=Multiverse[]
|
Once the server is started, multiple clients can be connected to the server by using one of the supported protocols:
Logging
The log files are created in the folder ./log with the filenames arcadedb.log.X,
where X is a number between 0 to 9, set up for log rotate.
The current log file has the number 0, and is rotated based on server starts or file size.
By default ArcadeDB does not log debug messages into the console and file. You can change this settings by editing the file config/arcadedb-log.properties. The file is a standard logging configuration file.
The default configuration is the following.
1 handlers = java.util.logging.ConsoleHandler, java.util.logging.FileHandler
2 .level = INFO
3 com.arcadedb.level = INFO
4 java.util.logging.ConsoleHandler.level = INFO
5 java.util.logging.ConsoleHandler.formatter = com.arcadedb.utility.AnsiLogFormatter
6 java.util.logging.FileHandler.level = INFO
7 java.util.logging.FileHandler.pattern=./log/arcadedb.log
8 java.util.logging.FileHandler.formatter = com.arcadedb.log.LogFormatter
9 java.util.logging.FileHandler.limit=100000000
10 java.util.logging.FileHandler.count=10
Where:
-
Line 1 contains 2 loggers, the console and the file. This means logs will be written in both console (process output) and configured file (see line 7)
-
Line 2 sets INFO (information) as the default logging level for all the Java classes between
FINER,FINE,INFO,WARNING,SEVERE -
Line 3 is as (line 2) but sets the level for ArcadeDB package only
SEVERE -
Line 4 sets the minimum level the console logger filters the log file (below
INFOlevel will be discarded) -
Line 5 sets the formatter used for the console. The
AnsiLogFormattersupports ANSI color codes -
Line 6 sets the minimum level the file logger filters the log file (below
INFOlevel will be discarded) -
Line 7 sets the path where to write the log file (the file will have a counter suffix, see line 10)
-
Line 8 sets the formatter used for the file
-
Line 9 sets the maximum file size for the log, before creating a new file. By default it is 100MB
-
Line 10 sets the number of files to keep in the directory. By default it is 10. This means that after the 10th file, the oldest file will be removed
If you’re running ArcadeDB in embedded mode, make sure you’re using the logging setting by specifying the arcadedb-log.properties file at JVM startup:
$ java ... -Djava.util.logging.config.file=$ARCADEDB_HOME/config/arcadedb-log.properties ...
You can also use your own configuration for logging. In this case replace the path above with your own file.
Server Plugins (Extend The Server)
You can extend ArcadeDB server by creating custom plugins. A plugin is a Java class that implements the interface com.arcadedb.server.ServerPlugin:
public interface ServerPlugin {
void startService();
default void stopService() {
}
default void configure(ArcadeDBServer arcadeDBServer, ContextConfiguration configuration) {
}
default void registerAPI(final HttpServer httpServer, final PathHandler routes) {
}
}
Once registered, the plugin (see below), ArcadeDB Server will instantiate your plugin class and will call the method configure() passing the server configuration. At startup of the server, the startService() method will be invoked. When the server is shut down, the stopService() will be invoked where you can free any resources used by the plugin. The method registerAPI(), if implemented, will be invoked when the HTTP server is initializing where one’s own HTTP commands can be registered. For more information about how to create custom HTTP commands, look at Custom HTTP commands.
Example:
package com.yourpackage;
public class MyPlugin implements ServerPlugin {
@Override
public void startService() {
System.out.println( "Plugin started" );
}
@Override
public void stopService() {
System.out.println( "Plugin halted" );
}
@Override
default void configure(ArcadeDBServer arcadeDBServer, ContextConfiguration configuration) {
System.out.println( "Plugin configured" );
}
@Override
default void registerAPI(final HttpServer httpServer, final PathHandler routes) {
System.out.println( "Registering HTTP commands" );
}
}
To register your plugin, register the name and add your class (with full package name) in
arcadedb.server.plugins setting:
Example:
$ java ... -Darcadedb.server.plugins=MyPlugin:com.yourpackage.MyPlugin ...
In case of multiple plugins, use a comma (,) to separate them.
Metrics
The ArcadeDB server can collect, log and publish metrics. To activate the collection of metrics use the setting:
$ ... -Darcadedb.serverMetrics=true
To log the metrics to the standard output use the setting:
$ ... -Darcadedb.serverMetrics.logging=true
To publish the metrics as Prometheus via HTTP, add the plugin:
$ ... -Darcadedb.server.plugins="Prometheus:com.arcadedb.metrics.prometheus.PrometheusMetricsPlugin"
Then, under http://localhost:2480/prometheus (or the respective ArcadeDB host) the metrics can be requested given server credentials.
For details about the response format see the Prometheus docs.
5.5.8. Changing Settings
To change the default value of a setting, always put arcadedb. as a prefix. Example:
$ java -Darcadedb.dumpConfigAtStartup=true ...
To change the same setting via Java code:
GlobalConfiguration.findByKey("arcadedb.dumpConfigAtStartup").setValue(true);
Check the Appendix for all the available settings.
Environment Variables
The server script parses a set of environment variables which are summarized below:
|
JVM location |
|---|---|
|
JVM options |
|
ArcadeDB location |
|
|
|
JVM memory options |
For default values see the server.sh and server.bat scripts.
RAM Configuration
The ArcadeDB server, by default, uses a dynamic allocation for the used RAM. Sometimes you want to limit this to a specific amount. You can define the environment variable ARCADEDB_OPTS_MEMORY to tune the JVM settings for the usage of the RAM.
Example to use 800M fixed RAM for ArcadeDB server:
$ export ARCADEDB_OPTS_MEMORY="-Xms800M -Xmx800M"
$ bin/server.sh
ArcadeDB can run with as little as 16M for RAM. In case you’re running ArcadeDB with less than 800M of RAM, you should set the "low-ram" as profile:
$ export ARCADEDB_OPTS_MEMORY="-Xms128M -Xmx128M"
$ bin/server.sh -Darcadedb.profile=low-ram
Setting a profile is like executing a macro that changes multiple settings at once. You can tune them individually, check Settings.
In case of memory latency problems under Linux systems, the following JVM setting can improve performance:
$ export ARCADEDB_OPTS_MEMORY="-XX:+PerfDisableSharedMem"
for more information, see https://www.evanjones.ca/jvm-mmap-pause.html
The Java heap memory is by default configured for desktop use; for custom containers, memory configuration can be adapted by:
$ export ARCADEDB_OPTS_MEMORY="-XX:InitialRAMPercentage=50.0 -XX:MaxRAMPercentage=75.0"
More information about Java memory configuration see this article.
5.5.9. Backup of a Database
ArcadeDB allows to execute a non-stop backup of a database while it is used without blocking writes or affecting performance.
There are two ways to perform backups:
-
Manual backup - Execute a one-time backup using SQL or the Studio UI
-
Automatic backup - Schedule recurring backups with retention policies (see Automatic Backup Scheduler)
For manual backups, you can execute the backup of a database from SQL. Look at Backup Database SQL command for more information.
Configuration
-
-f <backup-file>(string) filename of, or path to the backup file to create. -
-d <database-path>(string) path on local filesystem where to find the ArcadeDB database. -
-o(boolean) true to overwrite the backup if already exists. If false and thebackup-pathalready exists, an error is thrown. Default is false.

