0% found this document useful (0 votes)
52 views12 pages

Enhanced Differential Testing in Emerging Database

The document discusses enhanced differential testing for emerging database systems, which are often less mature and prone to bugs compared to established systems. The authors propose a tool called SQLxDiff that identifies shared clauses between emerging and relational databases to uncover logic and internal errors, finding a total of 57 previously unknown bugs. The results demonstrate SQLxDiff's effectiveness in improving the reliability of applications using these emerging systems.

Uploaded by

akts60950
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views12 pages

Enhanced Differential Testing in Emerging Database

The document discusses enhanced differential testing for emerging database systems, which are often less mature and prone to bugs compared to established systems. The authors propose a tool called SQLxDiff that identifies shared clauses between emerging and relational databases to uncover logic and internal errors, finding a total of 57 previously unknown bugs. The results demonstrate SQLxDiff's effectiveness in improving the reliability of applications using these emerging systems.

Uploaded by

akts60950
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Enhanced Differential Testing in Emerging Database Systems

Yuancheng Jiang, Jianing Wang, Chuqi Zhang, Roland Yap, Zhenkai Liang, Manuel Rigger
National University of Singapore
Singapore, Singapore
{yuancheng,jianing,chuqiz,ryap,liangzk,rigger}@comp.nus.edu.sg

Abstract Emerging database systems, often developed under market pres-


sure, are typically less mature and may have more bugs compared
In recent years, a plethora of database management systems have
to established relational database systems. Database system testing
surfaced to meet the demands of various scenarios. Emerging data-
aims to detect such bugs, which include internal errors and logic
base systems, such as time-series and streaming database systems,
bugs. Internal errors are unexpected aborts, exceptions, or crashes
are tailored to specific use cases requiring enhanced functionality
in database systems, while logic bugs silently cause applications
arXiv:2501.01236v1 [cs.SE] 2 Jan 2025

and performance. However, as they are typically less mature, there


using these database systems to produce incorrect outcomes. While
can be bugs that either cause incorrect results or errors impacting
we detect both types of bugs, we place greater emphasis on detect-
reliability. To tackle this, we propose enhanced differential testing
ing logic bugs. Unlike directly noticeable crash bugs, logic bugs
to uncover various bugs in emerging SQL-like database systems.
often escape the attention of users and developers. Consequently,
The challenge is how to deal with differences of these emerging
to effectively uncover these bugs, a proper test oracle is needed.
databases. Our insight is that many emerging database systems are
Despite the importance of finding bugs in emerging database
conceptually extensions of relational database systems, making it
systems, it has not garnered sufficient attention from both devel-
possible to reveal logic bugs leveraging existing relational, known-
opers and researchers. The only related work, Unicorn [44], aims
reliable database systems. However, due to inevitable syntax or
to find internal errors in time-series database systems. However,
semantics gaps, it remains challenging to scale differential testing
this approach, utilizing fuzzing techniques to trigger crashes, is
to various emerging database systems. We enhance differential test-
specifically tailored for time-series database systems and lacks a
ing for emerging database systems with three steps: (i) identifying
suitable test oracle for finding logic bugs. Moreover, its application
shared clauses; (ii) extending shared clauses via mapping new fea-
is not versatile enough to be extended for bug finding across a
tures back to existing clauses of relational database systems; and
variety of emerging database systems. This limitation arises due
(iii) generating differential inputs using extended shared clauses.
to the distinct semantics and data representation inherent to each
We implemented our approach in a tool called SQLxDiff and ap-
database system. Such diversity calls for a more adaptable and
plied it to four popular emerging database systems. In total, we
comprehensive strategy to effectively identify and address more
found 57 unknown bugs, of which 17 were logic bugs and 40 were
complex bugs in these varied environments.
internal errors. Overall, vendors fixed 50 bugs and confirmed 5. Our
In this work, our insight is that many emerging database sys-
results demonstrate the practicality and effectiveness of SQLxDiff
tems can conceptually be seen as extensions of relational database
in detecting bugs in emerging database systems, which has the
systems, making it possible to reveal various bugs by reference to re-
potential to improve the reliability of their applications.
sults from relational ones (i.e., differential testing), which have been
CCS Concepts extensively tested [4, 20, 24, 34–36, 49] and are more robust. To this
end, we generate differential inputs based on clauses with the same
• Software and its engineering → Software testing and debug- syntax and semantics in both emerging and relational database sys-
ging. tems, which we call shared clauses. However, a practical challenge
arises: differential testing is effective only for shared clauses, as
1 Introduction differing syntax and semantics could produce false alarms. This
Recently, the landscape of database systems has witnessed a signifi- limitation means that shared clauses may represent only a small
cant transformation with the emergence of diverse and specialized fraction of dissimilar database systems’ query languages, thereby
database systems (i.e., emerging database systems). These emerg- restricting the effectiveness of differential testing in exposing bugs.
ing database systems have been developed to address the evolving To tackle this limitation, we introduce a database system test-
needs of modern applications and the complexities of handling ing tool with enhanced differential testing, SQL-Cross-Differing
large amounts of data. Unlike traditional ones, emerging database (SQLxDiff), to uncover bugs in SQL-like emerging database systems
systems are designed with a focus on specific use cases, offering with the following steps: (i) clause identification, (ii) clause map-
optimized solutions for various industries and applications. For ping, and (iii) differential inputs generation. Specifically, SQLxDiff
instance, time-series database systems (e.g., QuestDB [33]) are tai- first identifies supported clauses 𝐶 1 in the emerging (test) database
lored for time-series data. Streaming database systems (e.g., Rising- system, and supported clauses 𝐶 2 in the relational (reference) data-
wave [37]) differ in that they process streaming and in-memory base system. The common clauses, 𝐶=𝐶 1 ∩ 𝐶 2 , form the shared
queries with a low latency. clauses. Next, to realize our key insight, SQLxDiff expands the
shared clauses via mapping dedicated clauses in the test database
Conf ’25, June 25–28, 2025, Trondheim, Norway system using existing clauses in the reference database systems
2025. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM (i.e., obtain the mapped clauses 𝐶 3 and the extended set 𝐶 ∪ 𝐶 3 ).
https://doi.org/10.1145/nnnnnnn.nnnnnnn
Search engines 100.0 95.14 102.39 106.2 105.81 104.82 106.81 101.62 105.13 110.96 143.78 141.42 152.69 156.13 163.

RDF stores 100.0 97.55 109.73 117.16 121.26 131.67 126.05 124.76 109.84 110.42 124.31 167.83 167.11 176.96 177.26 194.

Object oriented DBMS 100.0 99.89 99.35 100.59 93.5 103.8 101.07 109.33 101.97 103.77 104.73 126.73 125.5 121.52 113.87 113.

Native XML DBMS 100.0 98.6 101.62 107.9 118.83 120.14 112.71 109.1 98.33 99.04 108.87 153.8 143.57 142.89 141.67 149.

Vector DBMS

Wide column stores

Multivalue DBMS 100.0 92.91 88.6 88.51 89.49 107.15 109.91 99.49 97.65 94.

Spatial DBMS

Conf ’25, June 25–28, 2025, Trondheim, Norway Yuancheng Jiang, Jianing Wang, Chuqi Zhang, Roland Yap, Zhenkai Liang, Manuel Rigger

Graph DBMS Time Series DBMS Relational DBMS Others Key-value


Listing 1: Motivating Logic Bug Found in QuestDB 1400.0

Popularity Increase
1050.0
CREATE TABLE test (c0 INT); -- Initialize Schema --
INSERT INTO test VALUES (NULL); -- Initialize Data -- 700.0

Q: SELECT (c0 IN (0, NULL)) FROM test; -- False Alarm -- 350.0


-- QuestDB Result: [(True)] Postgres Result: [(Null)] -- 100.0
0.0
Q(mapped): SELECT (CASE WHEN c0 IS NULL THEN NULL ELSE

ay 3

p 3
Ja 013

ay 4

p 4
Ja 014

ay 5

p 5
Ja 015

ay 6

p 6
Ja 016

ay 7

p 7
Ja 017

ay 8

p 8
Ja 018

ay 9

p 9
Ja 019

ay 0

p 0
Ja 020

ay 1

p 1
Ja 021

ay 2

p 2
Ja 022

ay 3

p 3
Ja 023

ay 4

p 4
24
M 201

Se 01

M 201

Se 1

M 201

Se 01

M 201

Se 01

M 201

Se 1

M 201

Se 1

M 201

Se 1

M 202

Se 02

M 202

Se 02

M 202

Se 2

M 202

Se 2

M 202

Se 02
20

20

20

20

20

20

20
2
2

2
2

2
2

2
2

2
2

2
c0 IN (0) END) FROM test; -- True Fixed Bug --

n
Ja
-- QuestDB Result After Mapping: [(False)]!=[(Null)] -- Figure 1: Popularity Trend of Database Systems Past Decade

After we have a larger shared clause set, SQLxDiff generates se- • We develop a practical tool called SQLxDiff 3 , which generates
mantically equivalent, but syntactically different queries 𝑄 1 , 𝑄 2 via semantically equivalent queries expressed in different syntax by
randomly adding shared or mapped clauses for two target database clause identification, clause mapping, and query generation.
systems. Such mapped pairs of queries are not guaranteed to be • SQLxDiff has found many bugs (logic and internal errors) in pop-
equal syntactically but should fetch the same result in different ular emerging database systems. Our bug reports were positively
database instances. Therefore, if their results 𝑅1 and 𝑅2 are not received and acknowledged by the developers.
equal, we detect a logic bug in the emerging database system given 2 Emerging Database Systems
the trust in the maturity of relational database systems.
Listing 1 demonstrates how our approach enhances differen- In this section, we describe background knowledge about emerging
tial testing. The initial query 𝑄 yields a different result in the test database systems to (i) illustrate the importance of testing them,
database system (QuestDB) from the reference database system and (ii) give an intuition of our proposed testing methodology.
(PostgreSQL). Nevertheless, this difference is expected after investi- Firstly, we examine the scope of these newly introduced data-
gating their distinct treatments of null.1 Current differential testing base systems and study their popularity. Subsequently, we study
methods either report such queries as being different, a false alarm, the feature differences among these emerging database systems. We
or ignore such features (e.g., avoid testing null values). Our approach explain that their new features could be extended from existing ex-
leverages clause mappings to address this gap. As shown in query pressions in relational database systems. Moreover, we explore their
𝑄(mapped), we map the in clause to a case...when clause. This allows query languages and interfaces to identify that Structure Query
us to first handle null values and otherwise return the boolean value Language (SQL) is the predominant query language. In summary,
for non-null expressions. If the difference persists after mapping, we formulate and address the following research questions:
we identify a bug-inducing case. In this case, QuestDB promptly - RQ1: What are the types of emerging database systems, and how is
acknowledged and resolved the issue. their popularity compared with relational database systems?
To assess the effectiveness of our technique, we used SQLxD- - RQ2: What are new features in emerging database systems?
iff to test four popular emerging database systems: QuestDB [33], - RQ3: What are query languages in emerging database systems?
TDEngine [41], RisingWave [37], and CrateDB [9]. All selected To systematically explore emerging database systems, we collect
emerging database systems have growing popularity (e.g., having the latest statistics from reputable database survey platforms, in-
at least thousands of stars in Github) and are actively maintained cluding db-systems [11] and the database of databases (dbdb) [10].
and updated. We found a total of 57 previously unknown bugs (17 These platforms provide up-to-date information, including popular-
logic bugs and 40 internal errors) where 50 of them have been fixed ity scores, database types, and query interfaces (query languages)
and 5 confirmed. We evaluated SQLxDiff on its improvements in on newly introduced database systems. We also refer to their offi-
bug finding for SQL-like emerging database systems: (i) demonstrat- cial documentation and open-source platforms (e.g., GitHub) for
ing its effectiveness in finding more unknown bugs; and (ii) greater supplementary statistics and information.
coverage with more unique query plans. We also compared SQLxD- RQ1: Types and popularity. One notable trend in the evolu-
iff against the state-of-the-art test oracle, Ternary Logic Partition tion of database systems is the increasing popularity of emerging
(TLP) [35], and its bug-finding tool SQLancer [36]. The results show database systems, driven by their introduction of new features and
that SQLxDiff is more effective (e.g., identified 17 logic bugs that specialized data models tailored to specific data types and use cases.
SQLancer was unable to find) in detecting bugs and covers more However, these emerging database systems face the challenge of
unique query plans and greater code coverage when testing SQL- enhancing their accuracy and robustness through effective database
like emerging database systems. Our efforts received encouraging system testing. In this paper, we categorize database systems into
acknowledgments from QuestDB developers in a post blog.2 two groups: (a) conventional database systems and (b) emerging
In summary, we make the following contributions: database systems.
• We propose a key insight that emerging database systems are Conventional Database Systems. Conventional database systems
often extensions of relational ones, enabling extended differential typically refer to relational database systems, which have been
testing through syntax mapping between database instances. well-established and extensively developed over several decades.
They are the mainstream in database system adoption due to their
long-standing reliability. Well-known relational database systems
include MySQL [29], SQLite [40], and PostgreSQL [32]. Numerous
1 QuestDB regards null as a specific value and returns true because null(c0) is indeed in
the list, while PostgreSQL returns null directly if the left expression is null
2 Acknowledged at https://questdb.io/blog/fuzz-testing-questdb 3 SQLxDiff will be available upon paper acceptance.
Enhanced Differential Testing in Emerging Database Systems Conf ’25, June 25–28, 2025, Trondheim, Norway

studies [4, 20, 24, 34–36, 49] have focused on testing relational Listing 2: New Feature in QuestDB (Time-Series)
database systems, demonstrating their effectiveness in identifying SELECT ts,max(a),min(b) FROM T SAMPLE BY 1h FILL(PREV);
many previously unknown bugs. -- 2023-01-01T01:00:00.000000Z max1 min1 --
Graph Database Systems. Graph database systems such as -- 2023-01-01T02:00:00.000000Z max1 min1 -- (filled)
Neo4J [31] and TinkerPop [3] are the most popular emerging data- -- 2023-01-01T03:00:00.000000Z max3 min3 --
base systems in the past decade, which have experienced rapid
growth, as they directly work on graph data. Despite being newer given key or combination of keys. Additionally, time-series data-
compared to relational database systems, there has been consider- base systems include various timestamp functions or operations
able research on testing these systems [16, 19, 22, 27, 39, 48, 50] to like dateadd(), datediff(), and in. Such new features provide a direct
uncover unknown bugs. We do not further discuss them, given these way to access or modify timestamped data.
extensive existing efforts in creating effective testing approaches. Streaming database systems like RisingWave [37] are vital for
Emerging Database Systems. We refer to emerging database sys- applications demanding instant data updates and seamless syn-
tems as those proposed with dedicated features within the last ten chronization across users and systems. These systems are used in
years and which have seen a notable increase in popularity, with applications needing streaming features, such as messaging plat-
more than a 10% average rise in the period.4 Besides graph database forms, collaborative tools, and live analytics dashboards, which
systems, other types of emerging database systems are also grow- need low-latency access to data and concurrent user interactions.
ing fast. These database systems (e.g., time-series database systems, To achieve this goal, streaming database systems introduce various
vector database systems, streaming database systems, wide-column new features. For instance, RisingWave provides a new window
database systems, and spatial database systems) introduce dedicated function called tumble. The time window refers to temporal in-
features that provide more convenient functionalities. In contrast tervals that users can utilize to segment events and execute data
to their widespread scale and popularity, there has been limited computations in the streaming process. The tumble(table_or_source,
research into automated bug discovery [44, 45]. start_time, window_size) function in RisingWave generates a new
To better present their popularity differences, we collect statistics table alternative for data selection. Additionally, it also supports
from the aforementioned survey sources [10, 11] to summarize the window join that combines several time windows for convenient
popularity trends from 2013, with an initial popularity score of 100, queries in streaming data.
of widely-used database systems during the past decade shown
in Figure 1. The statistics reveal that graph database systems are RQ3: Query languages. In light of varied data models proposed
leading database models in the past decade while relational database in various emerging database systems, the statistics from the “data-
systems remain stable on the bottom. The statistics indicate that base of databases” [10] show a total of 18 distinct query interfaces
emerging database systems are experiencing rapid growth (e.g., (e.g., command line, SQL, custom API, Cypher, Datalog). Among
time-series database systems are the first fastest-growing category 450 individual database systems that have surfaced over the past
except for graph ones, as shown in Figure 1). However, despite this decade, 158 instances (35.1%) have opted for SQL-like queries as
growth, they surprisingly receive less attention in testing efforts their interfaces. This places SQL-like query languages as the most
compared to graph and relational database systems. common choice. Hence, this work focuses on testing emerging data-
base systems with reference to SQL to provide wider applicability.
RQ2: New features. To effectively showcase the landscape of
emerging database systems, we have opted to focus on two types 3 Approach
of them. Namely, time-series database systems due to their grow-
ing popularity shown in Figure 1, and streaming ones for their Our core insight is that many emerging database systems can con-
increasing importance in handling timing-critical tasks. ceptually be seen as an extension of relational database systems,
Time-series database systems have gained prominence in man- making it possible to reveal logic bugs by checking whether the
aging data points indexed by time, making them ideal for applica- emerging database systems’ results match the relational database
tions like Internet of Things (IoT) devices, financial trading, and systems’ ones. We introduce enhanced differential testing that al-
monitoring systems. These database systems (e.g., QuestDB [33], lows more extensive test cases to be evaluated on various emerging
TDEngine [41]) provide new features (e.g., date differing, date sam- database systems. Our approach has three key steps: (i) identify-
pling) related to date and timestamp as well as storage optimization ing shared clauses; (ii) expanding the shared clauses by mapping
and retrieval of timestamp data, enabling efficient analysis and vi- dedicated clauses or features in emerging database systems back
sualization of trends over time. Listing 2 gives an example query in to relational ones; and (iii) generating (mapped) differential inputs
QuestDB, which utilizes special clauses, namely sample by and fill, through randomly selecting shared and mapped clauses. The re-
to visualize the histogram of timestamp data. This query retrieves sults of the differential queries are then compared to determine
results from the table, identifying maximum and minimum values logic bugs. Given the widespread adoption and massive testing of
based on timestamps sampled at one-hour intervals. Additionally, it relational database systems, we consider their results to be reliable
fills in the results based on the preceding one in case no results are and accurate. Therefore, differences in the query results suggest
available for a given duration. One more new feature is the clause the presence of a logic bug in the emerging database systems.
latest on, which retrieves the most recent entry by timestamp for a We believe that this simple approach relying on finding and
utilizing commonalities in these related query languages has broad
4 This
upward trend corresponds to the “Slope of Enlightenment” in the Gartner Hype applicability across database systems supporting SQL-like query
Cycle [43], reflecting an increasing acknowledgment of their practical usage. languages. The clause mapping is not expected to be general but
Conf ’25, June 25–28, 2025, Trondheim, Norway Yuancheng Jiang, Jianing Wang, Chuqi Zhang, Roland Yap, Zhenkai Liang, Manuel Rigger

Clause Identifying supported clauses/features in two target database systems ( and )


Shared
Identification window sub-query sample by union tumble count_distinct varchar clauses
Shared ✓ ✓ Shared ✓ ✓ Shared ✓ ✓ Shared ✓ ✓ Failed ✗ ✗ Mappable ✓ ✗ Mappable ✗ ✓

Clause
Mapping Mapping clauses from emerging database systems ( ) to relational ones ( )
f1(): count_distinct(X) => count(distinct X) f2(): A in ‘B’ => extract(‘year’ from A)=B f3(): “string” type mapping Mapped
clauses
Query count_distinct count distinct in extract ‘=’ from symbol varchar
Generation

Differential input (SQL-like query) generation via randomly adding shared (mapped) clauses/features
clause: {count | count_distinct} clause: avg clause: partition clause: join, cross join clause: {in | extract} clause: union
Result select count_distinct(a.price), avg(b.price+c.price) over(partition by b.ts=c.ts) from (select * from t1 limit 100) as a join t2 as b
Analysis on t1.ts=t2.ts cross join t3 as c where a.ts in ‘2025’ and b.price<c.price union select count(price), avg(price) over(partition by ts) from t1;

select count(distinct a.price), avg(b.price+c.price) over(partition by b.ts=c.ts) from (select * from t1 limit 100) as a join t2 as b on t1.ts=t2.ts
Passed Bug cross join t3 as c where extract(‘year’ from a.ts)=2025 and b.price<c.price union select count(price), avg(price) over(partition by ts) from t1;

Figure 2: Approach Overview

necessarily specific to the kind of emerging database systems (or Finally, SQLxDiff performs result analysis at step 4 . Our test or-
even a specific database system pair) under test. In this paper, we acle assumes semantically equivalent queries should yield identical
demonstrate the application and effectiveness of our approach on results even with syntax differences. While either of the two sys-
two prominent types of emerging database systems, specifically tems might be affected by a bug, it is more likely that the emerging
time-series and streaming database systems identified in Section 2. database system will be affected due to its lower maturity. Upon
reproducing and validating the discrepancies, we document and
Approach overview. Our methodology aims to generate seman- report the identified bugs to the respective developers. The key
tically equivalent queries for emerging SQL-like database systems challenge of our approach is how to improve the extensibility of
(the system under test) against established relational database sys- differential testing by extending clauses via clause mappings that
tems (the reference result). The basic premise of our test oracle transfer dedicated features in emerging database systems into valid
is that semantically equivalent queries should yield identical re- expressions in relational database systems. The exact clause map-
sults, even when they are syntactically different. Figure 2 uses the pings are highly dependent on the emerging database system under
time-series database system QuestDB to illustrate our approach. test. We believe that developers of emerging database systems have
At step 1 , SQLxDiff initiates the testing process by determining a thorough understanding of how their systems’ features differ
supported clauses or features in the target and reference database from established relational database systems; thus, we expect that
systems. SQLxDiff employs a series of tests on various clauses to they can efficiently adopt our proposed approach. For illustration,
categorize these features into three distinct groups: Shared, Failed, we have selected two emerging database systems, time-series and
and Mappable. When a clause successfully passes the tests in both streaming to give examples of mappings.
database instances, it is labeled as Shared and included in the set
of shared clauses. Conversely, if a clause fails in both instances, Clause mapping for time-series database systems. From the
it is labeled as Failed, and the clause is discarded. In cases where statistics in Figure 1, time-series database systems are the second
a clause partially succeeds, it is designated as Mappable, and we fastest-increasing database model in the past decade. To test them,
attempt to further map such clauses in the next step. we first create a database schema with a timestamp column consis-
Step 2 tries to expand the collection of clauses by using clause tently set as the primary key for the table. We demonstrate clause
mappings to convert Mappable clauses into appropriate representa- mappings for QuestDB, a fast-growing time-series database system
tions in the reference database system. For example, we use 𝑓1 to that follows SQL syntax. Similar mappings exist for other time-
convert count_distinct(x) into count(distinct x) to ensure compatibility series database systems. PostgreSQL is used as a reference for a
with PostgreSQL’s function syntax, and 𝑓2 adjusts the usage of in reliable relational database system. In Table 1, we present notable
in QuestDB to conform to valid SQL syntax for the extract function. clause mappings along with corresponding query examples before
At step 3 , SQLxDiff uses the expanded set of clauses (i.e., shared and after the mappings. A detailed explanation follows.
clauses and mapped clauses) to generate valid queries for both data- (i) Mapping for clause sample by. From Section 2, the sample by is
base instances. Due to the introduced clause mappings in the pre- a new clause introduced in QuestDB to summarize large datasets
ceding step, the generated query pair may be syntactically different, into aggregates of homogeneous time periods as part of a select
which we call mapped query generation. For each shared or mapped statement (e.g., to process histogram of timestamp data). Consider
clause, we employ random selections to incorporate such clauses. the query select count(*) from sensors sample by 1h in QuestDB, which
Figure 2 shows the process of generating mapped queries for test- retrieves record numbers of every hour. Relational database sys-
ing QuestDB and PostgreSQL highlighting the mapped clauses. We tems, however, do not support such keywords. We address this by
then execute the queries on the test and reference database systems translating it into the valid query select sample_by_result from (select
to retrieve the results. count(*) as sample_by_result, extract(hour from date) as hour from sensors
Enhanced Differential Testing in Emerging Database Systems Conf ’25, June 25–28, 2025, Trondheim, Norway

Table 1: Illustrative List of Clause Mappings with Examples from Emerging to Relational Database Systems
ID Clause QuestDB (01-08) / RisingWave (09-10) PostgreSQL Mapping
01 Sample By select count(*) from T sample by 1h select a from (select count(*) as a, extract(hour from date) as b from T group by b)
02 Null select A in (1,2,3,null) select case when A is null then null else A in (1,2,3) end
03 DateAdd select dateadd(‘h’,1,ts) select cast((cast(ts as integer)+3600) as timestamp)
04 DateDiff select datediff(‘y’,now(),now()) select abs(extract(year from now())-extract(year from now()))
05 Latest On select * from t latest on a partition by b .. (select .. join (select distinct max(a) over(partition by b) as a, b from t) ..
06 Distinct select count_distinct(c0) from test select count(distinct c0) from test
07 Symbol create table test (c0 int, c1 symbol, c2 timestamp) create table test (c0 int, c1 varchar(128), c2 timestamp)
08 Between select count(*) from T where c0 between 2 and 0 select count(*) from T where c0 symmetric between 2 and 0
09 Tumble select * from tumble(test, c0, interval ‘1 day’) .. (select *, date_trunc(‘day’,c0) as s .. interval ‘1 day’) .. order by s, e) tumble
10 Hop .. hop(test, c2, interval ‘1 day’, interval ‘2 days’) .. (select *, date_trunc(‘day’,c0) as s .. interval ‘2 day’) .. order by s, e) hop

group by hour) in PostgreSQL via a combination of group by and different semantics or syntax in two database instances. For ex-
sub-query to generate an equivalent query to sample by. ample, the clauses between and symmetric between are both used
(ii) Mapping for null values. The null value in QuestDB differs to select values within a given range (i.e., between the lower and
from most database systems, representing a specific value rather upper bounds, inclusively). Some database systems, like QuestDB,
than the special nonexistent value in SQL [14, 15]. When a com- implicitly use symmetric between to accommodate invalid ranges,
parison involves null, QuestDB outputs boolean values while Post- while PostgreSQL explicitly requires the symmetric keyword. An-
greSQL outputs null. To bridge this gap, we make use of the shared other crucial aspect of clause mapping is type aliasing, which is
clause case...when to first identify the null value in comparison necessary when working with various database systems that use
operands, and if it exists, outputs null directly in QuestDB instead different type names. To ensure consistency and prevent syntax
of boolean values. Consider the query select A in (B). We map this errors, we employ data type alias mappings including symbol to
query into select case when A is null then null else A in (B) end to align varchar or text, int to integer, long to bigint, and timestamp to datetime.
the result as null with PostgreSQL. Similar clause mappings exist in In some cases, clause mappings help connect similar clause usages.
other comparison clauses (e.g., between). While this appears simple, For example, QuestDB has special functions for distinct aggrega-
it is effective in reducing the false alarms of differential testing. tion, such as count_distinct(x), while PostgreSQL uses the distinct
(iii) Mapping for date operations including dateadd, datediff, and keyword inside the count function as count(distinct x). Although
in. These functions provide more straightforward interfaces for these clause mappings may seem straightforward, they are vital for
directly operating on timestamp data. For the dateadd, we first cast accurate differential testing across dissimilar database systems.
timestamp data to an integer and cast it back after calculations.
For example, dateadd(‘h’,1,ts) is translated into cast((cast(ts as inte-
Clause mapping for streaming database systems. Streaming
ger)+3600) as timestamp). For datediff, the query datediff(‘y’,now(),now())
database systems are designed for handling streaming data with
in QuestDB returns the result 0, indicating that the two dates dif-
optimizations and advanced features. To deal with streaming data,
fer by zero years. We adopt clause mappings in our approach and
these systems process information as soon as it arrives, rather than
translate it into abs(extract(year from now())-extract(year from now()))
waiting until it has been stored. We show parts of clause mappings
as a valid and semantically equivalent query in PostgreSQL. The in
from Table 1 in the streaming database system RisingWave [37]
operator checks whether the given data is within the range. With
as follows: (i) Mapping for time window function tumble. In Ris-
timestamp data, it can be used as ts in ‘2023’ to check if the timestamp
ingWave, the tumble function generates a new table alternative for
is in the year 2023. We adapt it to relational database systems via
data selection. Nevertheless, relational database systems do not
the clause between. For example, ts in ‘2023-01;-3d’ can be translated
support such features. We translate tumble into a sub-query in re-
to ts between ‘2023-01-01 00:00:00’ and ‘2023-01-28 23:59:59’. Figure 2
lational database systems with additional columns window_start
also gives a complete example showing the clause mapping of date
and window_end. Take the select * from tumble(t, c0, interval ‘1 day’)
operations from the target database system to reference one.
as an example. We translate it into a valid query select * from (select
(iv) Mapping for clause latest on. The latest on clause in QuestDB
*, date_trunc(‘day’, c0) as s, date_trunc(‘day’, c0 + interval ‘1 day’) as e
retrieves the most recent data row by timestamp for given keys.
from t order by s,e) tumble_table in PostgreSQL. (ii) Similar to the tum-
Consider the query select * from t latest on c0 partition by c1 where
ble function, the hop function creates a new table with additional
c0 represents the designated timestamp primary key and c1 is an
consideration of time hop windows. We utilize similar translations
integer data column. PostgreSQL does not support this feature. We
through sub-queries with adjusted intervals. (iii) Mapping for win-
map it into select distinct * from (select t1.c0,t1.c1,t1.c2,t1.c3 from t as
dow join: The window join refers to joining a time window with a
t1 join (select distinct max(c0) over(partition by c1) as c0, c1 from t) as t2
table or another time window of the same type. As we map tum-
on t1.c0=t2.c0 and t1.c1=t2.c1) latest_on by replacing the latest on with
ble or hop into sub-queries in (i) and (ii), we intuitively join these
table joins and a window function.
two sub-queries to achieve an equivalent query in PostgreSQL. (iv)
(v) In addition to dedicated features in emerging database sys-
Other similar clause mappings (e.g., type mapping) in QuestDB are
tems, clause mappings are essential when common clauses have
also required in RisingWave. For brevity, we omit clause mappings
in RisingWave that correspond to those in QuestDB.
Conf ’25, June 25–28, 2025, Trondheim, Norway Yuancheng Jiang, Jianing Wang, Chuqi Zhang, Roland Yap, Zhenkai Liang, Manuel Rigger

Limited Scope of Differential Testing system. We continuously collect and complement clause candidates
(to be tested via a simple query) over three months by manually an-
Clause Mapping
Clauses Shared Mapped Clauses alyzing official documentation from various database systems with
of DB1 Clauses Clauses of DB2
the assistance of large language models [30]. In addition to various
Expanded Scope of Differential Testing
clauses, our approach also tests SQL features (e.g., sub-query) to
ensure the differential inputs reach more complex and deep logic.
Figure 3: How Our Approach Improves Differential Testing Database creation. Before each round of testing, we initialize
Discussion of clause mapping. In this paper, we propose the three tables. These tables can have up to 8 columns, each assigned a
use of clause mappings to simplify the problem of testing differ- data type such as integer, string, or timestamp, which we populated
ent SQL-like databases with differences in syntax and semantics. with 50 rows to 500 rows. The data for these tables is generated
One significant advantage is that we employ clause mappings to randomly, with special values (e.g., null) taken into consideration.
enhance testing coverage of database queries by translating clauses Query generation. We generate SQL statements based on the
between database systems, potentially improving overall testing three initialized tables by randomly adding shared or mapped
effectiveness. Figure 3 illustrates that shared clauses found may be clauses derived from our main approach. Our query generation
only a small fraction while expanding with mapped clauses extends is modular and scalable, allowing users to enable or disable various
the range of generated queries. We believe this approach is highly clauses or features as needed. We use several strategies to improve
scalable and can be applied to various other database engines. the complexity of generated queries: (i) replacing tables or expres-
Deriving clause mappings requires some manual effort to explore sions with sub-queries. Sub-queries are an important feature widely
the documentation and conduct mapping experiments on database used in SQL that increases query complexity; (ii) using query con-
systems. We highlight that manual effort of some form is common catenations. Query concatenations (i.e., union, except, and intersect)
among database system testing works. For example, as a metamor- can combine multiple queries into a single, more complex query;
phic testing approach, SQLancer [36] requires considerable manual and (iii) add advanced features. Advanced features like window
efforts, around 3K lines of Java code for each supported DBMS. functions and branching are also included in query generation.
Manual effort is needed simply because even with SQL databases, Result analysis. The test oracle of SQLxDiff is that we expect
SQL dialects differ in both syntax and semantics which is not dis- that query pairs constructed to generate the same results when
similar to the need for manual effort in SQLxDiff. Another example executed on emerging and relational database systems indeed com-
is Thanos [13], which uses differential testing. It requires manual pute the same result. Otherwise, it suggests a logic bug. Besides
effort to collect supported features from various storage engines. uncovering logic bugs, SQLxDiff also detects internal errors. The
In SQLxDiff, our manual effort is a reasonable one-time cost and main step is to distinguish whether one given exception is expected.
an easy task for developers of emerging database systems, as they We leverage heuristics to match the unexpected keywords (e.g., Null-
have likely developed their systems’ new features after studying PointerException, OutOfBoundException) and exclude valid exceptions
the functionalities of existing established database systems. Based (e.g., feature not supported).
on our experience, 10 to 20 clause mappings are already sufficient 5 Evaluation
to identify various bugs, and implementing a clause mapping takes
minutes to hours (i.e., a few days to adapt to one emerging DBMS). In this section, we answer the following questions to assess various
Furthermore, it may be possible to automatically obtain clause map- important aspects of SQLxDiff:
pings in the future using large language models. Our contribution • Q1 Discovery of Unknown Bugs. How effective is SQLxDiff
lies in demonstrating the feasibility of applying differential testing in discovering previously unknown bugs in popular emerging
to diverse database systems, thereby enhancing the extensibility of database systems?
differential testing. • Q2 Improvement on Differential Testing. To what extent
Clause mappings do not need to be complete to enhance dif- does SQLxDiff enhance the effectiveness of differential testing?
ferential testing, as shown in Figure 3, differential testing already • Q3 Comparison with Existing Techniques. What is the im-
handles shared clauses and we enlarge the working scope of differ- provement in testing emerging database systems compared with
ential testing. Partial clause mappings (e.g., 10–20 clause mappings) the state-of-the-art database testing approaches?
significantly expand test generation. We highlight that none of re-
lated works [35, 36] considers the completeness of database system Selected database systems. To assess the efficacy of our ap-
testing. Rather, an approach is considered effective if it finds bugs. proach, we have selected a subset of emerging SQL-like database
systems, namely QuestDB [33] (14.5K stars), TDEngine [41] (23.3K
4 Implementation stars), RisingWave [37] (6.9K stars), and CrateDB5 [9] (4.1K stars).
Emerging database systems have a diverse range of types and mod-
We present the implementation of SQLxDiff as shown in Figure 2:
els as shown in Section 2, making it challenging to cover all of them
(i) clause identification, (ii) database creation (iii) query generation,
in this work. The selection is based on their increasing popularity
and (iv) result analysis. Note that while some aspects are not the
and similarity to standard SQL syntax. We deliberately avoid includ-
key contribution of our work, they still significantly contribute to
ing emerging database systems that are implemented as close ex-
effective and extensible differential testing.
tensions of well-known relational ones (e.g., Timescale [42], closely
Clause identification. Clause identification aims to automati-
cally uncover supported SQL clauses in the given target database 5 CrateDB has multiple database models including time-series, geospatial, and vector.
Enhanced Differential Testing in Emerging Database Systems Conf ’25, June 25–28, 2025, Trondheim, Norway

Table 2: Unknown Bugs SQLxDiff Found or directly via operators (e.g., >,<,=). In QuestDB, we observed one
Database System
Internal Errors
Unknown Confirmed Fixed Unknown
Logic Bugs
Confirmed Fixed
In Total
-
discrepancy6 that occurred when counting distinct results with
QuestDB
TDEngine
0
0
0
1
32
2
1
1
2
1
9
1
44
6
string comparison in the predicate, as shown in Listing 3 from
RisingWave 0 0 3 0 1 0 4 PostgreSQL. Identifying this bug requires syntax-different differen-
CrateDB 0 0 2 0 0 1 3
In Total 40 17 57 tial inputs by mapping clauses to string-type keywords (i.e., from
symbol to varchar) and the clause count. This example illustrates
mirrors PostgreSQL syntax, thus differential testing is trivially appli-
the simplicity of clause mapping and the enhanced effectiveness it
cable). For relational database systems, we leverage PostgreSQL [32]
brings to differential testing.
as the reference ground truth for testing. Throughout the evalu-
ation process, we spent most of our efforts on testing QuestDB, Logic bug—incorrect query concatenation. The intersect and
which differs significantly from PostgreSQL, having both newly- except clauses are commonly used to combine results from two sub-
introduced features and semantic differences. We also applied our queries. We found a bug7 shown in Listing 4 which concerns the
approach to other emerging database systems to demonstrate wide logic of intersect and except clauses in QuestDB. The identification of
applicability, uncovering 6 bugs in TDEngine, 4 bugs in RisingWave, this bug results from applying query concatenations which combine
and 3 bugs in CrateDB, but we have spent less testing effort for three queries into a result unit. Another similar bug was found when
those database systems. executing complex SQL structures with multiple joins and the same
column names in CrateDB , as shown in Listing 4. We observe the
Experimental setup. We performed all experiments on a per-
difference in the result set from PostgreSQL after the necessary type
sonal computer with Intel(R) Core(TM) i7-14700 CPU and 32GB
mapping (e.g., to cast timestamp into bigint for comparison) when
RAM. The OS is Ubuntu 20.04.2 LTS. SQLxDiff can efficiently detect
executing union queries with duplicated output columns. These
all confirmed or fixed bugs within six hours of running on personal
bugs were resolved after we submitted the cases causing the bug to
computing resources.
the developers.
5.1 Discovering Unknown Bugs Logic bug—incorrect nested joins and window function.
To detect new logic bugs in emerging database systems, we inter- Complex query structures (e.g., with nested joins or window func-
mittently tested the latest versions of the target emerging database tions) tend to trigger bugs in emerging database systems. One bug
systems, using PostgreSQL-14.0 to derive the ground-truth results, SQLxDiff found is related to nested joins. A nested join in SQL
over a period of three months, which is a typical methodology for involves the use of multiple join operations within a single query,
evaluating the effectiveness of automatic testing tools [22, 36]. In which allows for the retrieval of data that spans multiple tables.
most cases, SQLxDiff took a few hours until a bug was found un- Listing 5 shows Q1, a bug-inducing query computing an incorrect
der personal computer resources. We reported bugs after reducing result due to an unknown bug 8 The issue is in the transitive filter
bug-inducing queries and checking whether the issue had already pass when handling nested table joins. Another bug that SQLxDiff
been reported on issue trackers to avoid duplicate bug reports. found is related to the window function, which calculates their re-
Bug-inducing test cases generated automatically are typically com- sults across a set of rows related to the current row. Unlike regular
plex, and we reduced them to a smaller bug-inducing version by aggregate functions, window functions do not collapse the result
delta debugging [47]. Next, we illustrate bugs with reduced queries, Listing 3: Incorrect String Comparison in QuestDB
omitting unnecessary query mappings.
CREATE TABLE test (c_0 SYMBOL);
Results. Table 2 summarizes the number of previously unknown INSERT INTO test VALUES ('A');
bugs identified using our approach. We classified the identified bugs SELECT count_distinct(c_0) FROM test WHERE c_0>'Z';
into two distinct categories: (i) Internal Errors refer to bugs where -- Emerging Database System Result: [(1)] --
a query caused unexpected aborts, exceptions, or crashes in the CREATE TABLE test (c_0 VARCHAR(16));
target database system; (ii) Logic Bugs refer to bugs found through INSERT INTO test VALUES ('A');
discrepancies flagged by the differential testing. Additionally, we SELECT count(DISTINCT c_0) FROM test WHERE c_0>'Z';
categorized all bugs into three statuses: (i) Unknown bugs are those -- Relational Database System Result: [(0)] --
that have been identified and submitted but are awaiting further Listing 4: Incorrect Concatenated Query in QuestDB/CrateDB
investigation by developers to determine the root cause; (ii) Con-
firmed bugs are those that have been acknowledged by developers (SELECT 1 UNION ALL SELECT 1) EXCEPT (SELECT 0);
but have not yet been fixed; (iii) Fixed bugs are those that have (SELECT 1 UNION ALL SELECT 1) INTERSECT (SELECT 1);
-- QuestDB: [(1),(1)] PostgreSQL: [(1)] --
been confirmed and subsequently fixed by the developers.
CREATE TABLE test (c_0 TIMESTAMP, c_1 INT, c_2 FLOAT);
In total, we identified 57 unknown bugs (17 logic bugs and 40 INSERT INTO test VALUES (946702800000, 8, 3.0);
internal errors), of which 5 were confirmed and 50 were fixed. INSERT INTO test VALUES (946688400000, 9, 7.0);
For illustration, in the subsequent paragraphs, we present parts of INSERT INTO test VALUES (946695600000, 6, 8.0);
noteworthy bugs that SQLxDiff identified, describing them based (SELECT .. FROM test as t1 CROSS JOIN ...) UNION (...);
on their root cause through our analysis and developers’ feedback. -- CrateDB: [(3,1,1), ..] PostgreSQL: [(3,2,1), ..] --

Logic bug—incorrect string comparison. A common feature 6 https://github.com/questdb/questdb/issues/3828


of database systems is string comparisons, which are commonly 7 https://github.com/questdb/questdb/issues/3580

supported by either using specific functions like strcmp in MySQL 8 https://github.com/questdb/questdb/issues/4010


Conf ’25, June 25–28, 2025, Trondheim, Norway Yuancheng Jiang, Jianing Wang, Chuqi Zhang, Roland Yap, Zhenkai Liang, Manuel Rigger

Listing 5: Logic Bugs with Nested Joins or Window Functions Listing 7: Internal Errors in TDEngine/CrateDB
CREATE TABLE t(c0 TIMESTAMP, c1 INT, c2 INT); CREATE TABLE test(c0 TIMESTAMP);
INSERT INTO t VALUES('2025-01-01 10:00:00+00', 1, 1); insert into test values ('2025-01-01 00:00:00.000');
INSERT INTO t VALUES('2025-01-01 10:00:00+00', 1, 1); Q1: select count(1) from test as a join test as b on a.
Q1: SELECT count(1) FROM t as T1 CROSS JOIN t as T2 c0=b.c0 and a.c0 is null;
WHERE T2.c0='2000-01-01 00:00:00+00' INTERSECT -- TDEngine: FATAL crash signal...(core dumped) --
SELECT count(1) FROM t as T1 JOIN t as T2 on T1.c0= Q2: SELECT DISTINCT count ... sys.summits as T2 JOIN sys
T2.c0 JOIN t as T3 ON T2.c0=T3.c0; .summits as T3 ON T2.mountain>T3.mountain;
-- QuestDB Result: [(0)] PostgreSQL Result: [] -- -- CrateDB: Index 2 out of bounds --
Q2: SELECT avg(T1.c_1) OVER(PARTITION BY 1=1 ORDER BY T1
.c_0) FROM test AS T1 CROSS JOIN test AS T2; differential testing approach through three main aspects: (i) discov-
-- QuestDB: [(1),(1.333333)] PostgreSQL: [(1),(1.5)] -- ering more bugs, (ii) covering more unique query plans, and (iii)
achieving higher query execution success rates.
Listing 6: Unexpected Syntax Errors in QuestDB
CREATE TABLE test (c0 short, c1 timestamp); -- QuestDB Enhanced effectiveness in detecting logic bugs. SQLxDiff
CREATE TABLE test (c0 float, c1 timestamp); -- Postgres identifies significantly more logic bugs, benefiting from the clause
Q1: SELECT avg(c0) FROM test UNION SELECT DISTINCT avg( mapping technique, which broadens the scope of differential testing
c0) FROM test; by bridging non-shared clauses in the two target database systems.
-- QuestDB: "Invalid Column c0" PostgreSQL: [()] -- To evaluate the impact of clause mappings, we first conducted
Q2: SELECT count(1) FROM test AS T1 JOIN test AS T2 ON a manual analysis of all 17 logic bugs identified by SQLxDiff to
T1.c1<T2.c1 JOIN test AS T3 ON T2.c1=T3.c1; determine the necessity of these mappings for triggering the bugs.
-- QuestDB: "Invalid Column c1" PostgreSQL: [()] -- Following this, we compared the clause success rates with and
set into a single value for each group. In QuestDB, we observe one without the use of clause mappings.
bug-inducing case,9 shown as Q2 in Listing 5, which outputs differ- Among 17 logic bugs identified by SQLxDiff, 7 of them are de-
ently compared to PostgreSQL when executing window function tected with the assistance of clause mappings. The necessary clause
queries with order by clause and table joins. mappings for reproducing logic bugs include (i) type mappings, as
type keywords may vary across different database instances (e.g.,
Internal errors—incorrect syntax errors. As one category of the string type may be represented as symbol or varchar). Without
internal errors, SQLxDiff also detects various incorrect syntax er- type mappings, the bug cannot be reproduced as bug-inducing cases
rors through expanded differential testing. We identified these bugs (e.g., Listing 3) would fail at the create table statement; (ii) function
by observing syntax errors in QuestDB while the same queries mappings, where functions like count_distinct() in emerging database
returned correctly in PostgreSQL. As shown in Listing 6, the query systems require rewriting as count(distinct) for correct functionality;
Q1 or Q2 reports invalid column c0/c1 in QuestDB, whereas the same and (iii) feature mappings, which align new features with valid
query executes normally in PostgreSQL. These discrepancies found keywords in relational database systems as introduced in Section 3.
via differential testing help identify such unexpected syntax errors However, not every logic bug necessitates reproduction via clause
more efficiently. mappings. For instance, the inaccurate intersect/except results in
Internal errors—core dumped/out of bound. Another notable Listing 4 are deemed valid in both emerging and relational database
category of bugs we encountered involves internal errors where the systems even without the aid of mappings. Our methodology ex-
target database systems crashed. As shown in Listing 7, SQLxDiff pands differential testing by bridging syntax gaps and mapping new
found a segmentation fault in TDEngine and an out-of-bound error features, allowing many SQL queries to be valid in both systems.
in CrateDB. To date, 40 internal errors have all been confirmed or Our findings highlight the benefits of clause mappings in detecting
fixed by the developers. bugs across different database models.

Discussion—false alarms. We do not observe any false alarms Covering more unique query plans. A query plan outlines the
when running SQLxDiff. However, we encountered 3 false alarms ordered steps (e.g., table scans, table joins) that database systems will
while developing SQLxDiff. Note that any potential false alarms, as take to access data. Created by the query optimizer, it aims to find
determined by us or the DBMSs’ developers, can be easily addressed the most efficient way to execute the query based on factors such
by modifying or removing the clause mapping. This is a common as the database schema, data distribution, and available indexes. A
methodology and has been used in many other differential testing higher number of executed query plans increases the probability
approaches [6–8, 21, 28, 38, 48]. that the testing tool can uncover unknown bugs, as demonstrated
in existing works [4, 5].
5.2 Improvement on Differential Testing We conducted a 24-hour experiment using SQLxDiff to compare
One key question is how much we can improve upon traditional the unique query plans covered by classic differential testing (i.e.,
differential testing. We evaluate the enhanced effectiveness of our without clause mapping) and an expanded scope of differential
testing (i.e., SQLxDiff, with clause mapping). As shown in Figure 4,
SQLxDiff covers 14,704 additional unique query plans, representing
a 20% improvement after 24 hours of testing. These results suggest
9 https://github.com/questdb/questdb/issues/3936 that the expanded scope of differential testing improves the ability
Enhanced Differential Testing in Emerging Database Systems Conf ’25, June 25–28, 2025, Trondheim, Norway

to execute a greater variety of query plans, thereby uncovering Listing 8: Consistent Sum Results—Ineffectiveness of TLP
more unknown issues compared to traditional differential testing. -- Buggy Query QuestDB Result [] vs. PostgreSQL [(1)]--
Higher query execution success rate. To better illustrate the (SELECT DISTINCT avg(c0) over(partition by 1) FROM t);
impact of clause mappings on the testing effectiveness of differen- -- TLP-True Query QuestDB Result [] --
(SELECT .. over(partition by 1) FROM t WHERE c0>0);
tial execution in both emerging database systems and relational
-- TLP-False Query QuestDB Result [] --
database systems, we compare clause execution success rates with
(SELECT .. over(partition by 1) FROM t WHERE not c0>0);
and without clause mappings. The success rates of queries exhibit -- TLP-Null Query QuestDB Result [] --
significant variations under different configurations. We noted in- (SELECT .. over(...) FROM t WHERE c0>0 is NULL);
stances where the success rate dropped to 0% when type mappings
were not enabled, resulting in failures during table initialization. Effectiveness. We used the same methodology as prior
Conversely, the success rate remained stable when new features works [34, 35], that is, to conduct a manual and best-effort analysis
were infrequently incorporated into query generations. In summary, to identify bugs found by SQLxDiff that TLP overlooked. To adapt
we observed a decrease in query success rates ranging from around TLP in emerging database systems, we implement TLP to parti-
23.9% (e.g., only queries with new features output syntax errors) tion the original queries as a manual best-effort process leveraging
to 100% (e.g., all queries failed without necessary type mappings existing predicates or introduce relevant predicates that refer to
in create table statements) on QuestDB, RisingWave, CrateDB, and columns in the original queries in cases where no predicates exist.
PostgreSQL. This underscores how clause mappings contribute to For 17 bug-inducing test cases to which test oracles could be ap-
the efficacy of differential testing in dissimilar database systems. plied, TLP does not surprisingly reveal any of these bugs, showing
the unique strength of differential testing across database systems.
5.3 Comparison with Existing Techniques The primary reason why TLP fails to detect them is that the pres-
We now evaluate the effectiveness of SQLxDiff compared with ex- ence of predicates does not influence the incorrect outcome. The
isting techniques. To the best of our knowledge, there is only one root causes of such bugs are traced back to other clauses, such
(non-public) existing work on testing emerging database systems, as over(partition by), join, union, or unrelated expressions like null
Unicorn [44], which focuses on fuzzing time-series database sys- and features. In cases where these bugs originate, the conceptual
tems. We contacted the authors to ask whether they could share addition of predicate partitions does not impact the results. A spe-
their artifacts with us for comparison but have not received a re- cific example is illustrated in Listing 8. The bug initially occurs
sponse. Unicorn proposed hybrid input generation to create valid within a sub-query (enclosed in brackets) that fails differential test-
SQL queries for time-series database systems. However, as Unicorn ing in comparison with PostgreSQL. Upon query reduction, the
uses a crash oracle, it is unable to detect more complex bugs like bug-inducing scenario reveals that introducing brackets to a query
unexpected syntax errors and logic bugs. with a window function causes the bug. Although we add predi-
To further assess SQLxDiff, we ported the state-of-the-art test cates related to the sole column reference in the query, it does not
oracle on relational database systems, Ternary Logic Partition influence the result, indicating that TLP fails to find this bug.
(TLP) [35] 10 implemented in SQLancer [36], on our selected emerg- In addition to logic bugs, 18 of 40 internal errors SQLxDiff found
ing database systems. Different from our approach, TLP leverages output inconsistent aborts, where the differential input correctly
metamorphic mutations that partition one query into three sub- returns in the reference database system. Detecting such bugs gener-
queries, each selecting rows based on whether a predicate generates ally requires more than a simple test oracle of metamorphic testing;
true, false, or null. We compare the effectiveness by checking how it usually necessitates
Differential Testing
additional handlers for errors, exceptions,
Differential Testing in Expanded Scope (SQLxDiff)
100,000
many unknown logic bugs SQLxDiff found can be detected by using or crashes (e.g., Unicorn [44]). In contrast, our enhanced 85,879 differen-
Unique Query Plans

88,604
79,774 83,032
80,000 76,306 73,897
the TLP test oracle. Then, we evaluate SQLxDiff against SQLancer tial testing leverages the correct 63,037
68,022output of67,033
72,357
61,232 64,313 the reference
69,560 71,765
database
57,063 57,700
60,000 53,655
by comparing unique query plans and code coverage, demonstrat- system, enabling
38,669
efficient
49,409
42,650
48,767 identification of these bugs.
40,000 34,133
ing our improvements in testing emerging database systems.
Unique
20,000 17,516
16,965 query plans. We conducted comparison experiments
Differential Testing Differential Testing in Expanded Scope (SQLxDiff)
100,000 against SQLancer, evaluating the number of unique query plans
Unique Query Plans

85,879 88,604 0
83,032
79,774
0 2 22 (h)
80,000
63,037
68,022
72,357
76,306
64,313 67,033 69,560 71,765 73,897
observed over 244 hours
6
to8 simulate
10 12
the 14 16 18
bug-finding 20
capabilities of
61,232
57,063 57,700
60,000
49,40948,767
42,650
53,655
various testing approaches. Although SQLancer supports QuestDB,
38,669
40,000 34,133
this support is only in its initial stages. We also found open bugs11
17,516
16,965
20,000
in its issue tracker, which we encountered and fixed, aiming for a
0 SQLancer SQLxDiff
0 2 4 6 8 10 12 14 16 18 20 22 (h) 1E+06
Unique Query Plans

68,022 72,357 76,306 79,774 83,032 85,879 88,604


Figure 4: SQLxDiff’s Improvement on Unique Query Plans 1E+05
17,516
38,669 49,409 57,063 63,037

1E+04

1E+03
Table 2 1E+02
13 14 14 15 16 16 18 18 18 18 18 18
Di erential Testing Di erential Testing 1E+01
10 Reasons for using TLP include: TLP is the state-of-the-art testinSQLancer
1E+06 oracle
Expandedfor finding
SQLxDiff
Unique Query Plans

logic bugs in SQL databases. TLP can be seen


63,037 as68,022
a generalization
72,357 76,306 Scope (SQLxDi
of79,774 83,032 ) 85,879
Non-Optimizing 88,604
1E+05 38,669 49,409 57,063 0 2 4 6 8 10 12 14 16 18 20 22(h)
Reference Engine17,516
Construction (NoREC). 0 Query Plan Guidance 16965 (QPG) is a test-case
17516
1E+04
generation approach, and not a 2test oracle. Pivoted Query Synthesis (PQS)
Figure 5: Comparison of Unique Query Plans
34133 38669 is no
1E+03
longer maintained in SQLancer and 4 thus not used for comparison. 42650 Differential49409Query
1E+02
Execution (DQE)13
aims14 at testing
14
Data Manipulation
615 16 16 18 Language
18
48767
(DML)18 statements,
18
5706318 18
ential Testing which are1E+01
out of the scope of this paper. 11 https://github.com/sqlancer/sqlancer/issues/712
panded 8 53655 63037
Table 2-1
e (SQLxDi )
0 2 4 10 6 8 10 12 14
57700 16 18 6802220 22(h) SQLancer SQLancer SQLxDi
17516 (Adapted)
12 61232 72357
38669 0 13 23 17516
14 64313 76306
49409 2 14 32 38669
16 67033 79774
57063 4 14 35 49409
18 69560 83032
63037
Table 2-1 6 15 40 57063
20 71765 85879
68022 SQLancer SQLancer SQLxDi 8 16 42 63037
22 (Adapted) 73897 88604
72357 10 16 43 68022
0 13 23 17516
76306 12 18 43 72357
Conf ’25, June 25–28, 2025, Trondheim, Norway Yuancheng Jiang, Jianing Wang, Chuqi Zhang, Roland Yap, Zhenkai Liang, Manuel Rigger

SQLancer SQLxDiff SQLancer SQLxDiff


20,500
19689 19863 19863 19863 85,000
83717 83834
bug finding including logic bugs given the differences between the
83222 83459
Instruction

19,375
database systems.
Coverage

83,000

18,250 81,000 80385 80411


80047

17,125
17074 17122 17122
79,000
Differential testing. Another line of research to detect logic bugs
16572

16,000 77,000
77200 is differential testing [28], which detects bugs by executing a test
10 60 600 1440(mins) 10 60 600 1440(mins) case using multiple versions or instances of systems that implement
io.questdb.cairo Overall
Figure 6: Code Coverage Comparison with SQLancer the same semantics, and any discrepancy indicates a potential bug
in one of these systems. Researchers have utilized this method to
fairer comparison. We ensured that each query plan was unique by detect bugs across various domains, such as web services [7], Java
inspecting the sequence and order of steps involved. Virtual Machine (JVM) implementations [8], compilers [46], and
As 15770
17,000 shown in Figure16570
16559
5, SQLxDiff demonstrated significant im- network protocols
1,800 1719 [6].
1757 Differential
1765 testing was applied to testing
13470 13512 13543
provements
13,600 compared to SQLancer. The statistics indicate that even database systems as a system called RAGS [38], which executes a
1,440 12948
state-of-the-art
10,200 testing tools face challenges in fully supporting 13,000
query on multiple different database systems and compares their
12813

emerging
6,800 database systems due to semantic or syntax discrepan- 12,600
12590
results. APOLLO 1,080 [21] also applies differential testing to detect per-
cies.
3,400 This aligns with our observations that SQLancer’s generated formance issues by comparing the execution time of the same query
(mins) (mins) 12,200 720 (mins)
queries
0
for QuestDB are limited toio.questdb.grif
several simple
in queryio.questdb.std
structures. io.questdb.jit
on different versions
11904 of 11908
the same database system. In this work, we
Our approach20 40
leverages 60
enhanced differential testing to expand the
11,800 11751
propose a novel 360 approach
318 318
to318enhance the scalability of differential
testing scope through clause mapping, thereby achieving a much 11,400 testing, making it applicable for uncovering logic bugs in emerg-
0
higher number of unique query (mins) plans. ing database systems 20 that may
40 60 have nontrivial dissimilarities from
(mins) 11,000
traditional
20 relational databases.
40 60
Coverage. Code coverage is a widely usedTable 1
metric to evaluate the
SQLancer SQLxDi
effectiveness of database system testing approaches [35, 36]. We Detecting memory errors in database systems. Most previous
10 1,075
evaluate the code coverage of16572 SQLxDiff and19689
SQLancer on QuestDB 772
methods for testing database systems have focused on memory er-
over6024 hours (as suggested in17074
previous work19863 2,106
[23]) using the publicly rors, which does not require an explicit test oracle. Grey-box fuzzers,
2,471
available
600 JaCoCo [18] library. 17122
We present two statistics:
19863 overall
1,178 code such as AFL [1], use mutation and code coverage as the fuzzing
1,056
coverage and relevant code coverage focused on the SQL engine strategy. However, for database systems, the mutation methods typ-
1440 17122 19863
(i.e., io.questdb.cairo class) of QuestDB. The focused data excludes ically incur invalid test cases due to the constraints of SQL grammar.
components not involved in query processing, such as functionality Squirrel [49] uses a syntax-preserving mutation method to increase
for establishing connections and authentications. Figure 6 shows the rate of valid test cases during mutation. Generation-based meth-
that SQLxDiff surpasses SQLancer in either overall code coverage ods, such as SQLSmith [2], DynSQL [20], and ADUSA [26], generate
or focused code coverage. In other classes, SQLxDiff also achieves test cases according to a grammar. Griffin [12] uses a grammar-
comparable code coverage to SQLancer. There have been some free test case generation method to alleviate the human effort to
concerns about using code coverage as a convincing metric for construct grammar for database systems. While these works have
evaluating testing approaches as stated in a previous study [17]. generated test cases efficiently, they have not tackled the test oracle
Higher coverage does not necessarily detect more bugs without a problem to find logic bugs.
proper test oracle.
Table 1-1 7 Conclusion
6 Related Work
SQLancer SQLxDi
Testing emerging database systems to find logic and other bugs
20 15770 13470 15,770
We briefly summarize the most relevant related work. is challenging even when restricted to SQL-like systems as each
40 16559 13512 16,559has differences in syntax and semantics. The core insight behind
Detecting logic bugs in database
60 16570 systems. Logic bugs, which re-16,570our work is recognizing that emerging database systems can be
13543
fer to incorrect results returned by database systems, are difficult viewed as extensions of relational database systems, sharing sub-
to detect as they require a test oracle, a mechanism that decides stantial functionality. We propose an extension of differential test-
whether the test case’s result is expected. Song et al. proposed the ing using relational database systems as the ground truth and show
oracle, Differential Query Execution (DQE) [39], to detect logic bugs how to deal with the challenges with our approach combining
in database systems by checking whether SQL queries with the finding shared clauses, expanding clauses with mappings, and a
same predicate access the same rows. Rigger et al. proposed the or- simple query generator that has broad applicability with shared and
acles Pivoted Query Synthesis (PQS) [36], Non-Optimizing Reference mapped clauses. We applied SQLxDiff to four popular emerging
Engine Construction (NoREC) [34], and TLP [35] to detect logic bugs database systems, uncovering a total of 57 unknown bugs, including
in relational database systems by checkingTable 1-1-1
results’ consistency of 17 logic bugs and 40 internal errors. Vendors have confirmed or
SQLancer SQLxDi
several related queries, and have found hundreds of bugs. To gen- fixed 55 of these bugs. We believe that SQLxDiff shows the feasibil-
20 12590
erate test cases for such test oracles, SQLRight 11751 [25] leverages code 12,590ity of a unified testing framework for emerging database systems,
40
coverage to guide the test case12813
generation, and Query Plan Guidance 12,813
11904 which is important for enhancing their robustness and reliability.
(QPG) 60 [4] guides the test cases toward unseen
12948 11908
query plans aiming 12,948
to generate the test cases that trigger diverse behaviors. Rather than
finding bugs in relational database systems, we focus on emerging
databases which have differences from each other and also with
relational databases. Thus, the challenge is how to have effective

Table 1-1-1-1

SQLancer SQLxDi
Enhanced Differential Testing in Emerging Database Systems Conf ’25, June 25–28, 2025, Trondheim, Norway

References 1145/3243734.3243804
[24] Yu Liang, Song Liu, and Hong Hu. 2022. Detecting Logical Bugs of { DBMS } with
[1] 2013. American Fuzzy Lop (AFL) Fuzzer. http://lcamtuf.coredump.cx/afl/ Coverage-based Guidance. In 31st USENIX Security Symposium (USENIX Security
technical_details.txt. 22). 4309–4326.
[2] Sjoerd Mullender Andreas Seltenreich, Bo Tang. 2024. https://github.com/anse1/ [25] Yu Liang, Song Liu, and Hong Hu. 2022. Detecting Logical Bugs of DBMS with
sqlsmith. Coverage-based Guidance. In 31st USENIX Security Symposium (USENIX Security
[3] Apache. 2024. https://tinkerpop.apache.org/. 22). USENIX Association, Boston, MA, 4309–4326.
[4] Jinsheng Ba and Manuel Rigger. 2023. Testing Database Engines via Query Plan [26] Xinyu Liu, Qi Zhou, Joy Arulraj, and Alessandro Orso. 2022. Automatic detection
Guidance. In 2023 IEEE/ACM 45th International Conference on Software Engineering of performance bugs in database systems using equivalent queries. In Proceedings
(ICSE). 2060–2071. https://doi.org/10.1109/ICSE48619.2023.00174 of the 44th International Conference on Software Engineering (Pittsburgh, Pennsyl-
[5] Jinsheng Ba and Manuel Rigger. 2024. Keep It Simple: Testing Databases via vania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA,
Differential Query Plans. Proceedings of the ACM on Management of Data 2, 3 225–236. https://doi.org/10.1145/3510003.3510093
(2024), 1–26. [27] Qiuyang Mang, Aoyang Fang, Boxi Yu, Hanfei Chen, and Pinjia He. 2024. Testing
[6] David Brumley, Juan Caballero, Zhenkai Liang, and James Newsome. 2007. Graph Database Systems via Equivalent Query Rewriting. In Proceedings of
Towards Automatic Discovery of Deviations in Binary Implementations the IEEE/ACM 46th International Conference on Software Engineering (Lisbon,
with Applications to Error Detection and Fingerprint Generation. In 16th Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA,
USENIX Security Symposium (USENIX Security 07). USENIX Association, Boston, Article 143, 12 pages. https://doi.org/10.1145/3597503.3639200
MA. https://www.usenix.org/conference/16th-usenix-security-symposium/ [28] William M McKeeman. 1998. Differential testing for software. Digital Technical
towards-automatic-discovery-deviations-binary Journal 10, 1 (1998), 100–107.
[7] Peter Chapman and David Evans. 2011. Automated black-box detection of [29] MySQL. 2024. https://www.mysql.com/.
side-channel vulnerabilities in web applications. In Proceedings of the 18th ACM [30] OpenAI. 2024. https://chat.openai.com/.
Conference on Computer and Communications Security (Chicago, Illinois, USA) [31] Neo4j Graph Platform. 2024. https://neo4j.com/.
(CCS ’11). Association for Computing Machinery, New York, NY, USA, 263–274. [32] PostgreSQL. 2024. https://www.postgresql.org/.
https://doi.org/10.1145/2046707.2046737 [33] QuestDB. 2024. https://questdb.io/.
[8] Yuting Chen, Ting Su, Chengnian Sun, Zhendong Su, and Jianjun Zhao. 2016. [34] Manuel Rigger and Zhendong Su. 2020. Detecting optimization bugs in database
Coverage-directed differential testing of JVM implementations. In Proceedings engines via non-optimizing reference engine construction. In Proceedings of
of the 37th ACM SIGPLAN Conference on Programming Language Design and the 28th ACM Joint Meeting on European Software Engineering Conference and
Implementation (Santa Barbara, CA, USA) (PLDI ’16). Association for Computing Symposium on the Foundations of Software Engineering (Virtual Event, USA)
Machinery, New York, NY, USA, 85–99. https://doi.org/10.1145/2908080.2908095 (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA,
[9] CrateDB. 2024. https://cratedb.com/. 1140–1152. https://doi.org/10.1145/3368089.3409710
[10] dbdb. 2024. https://dbdb.io/. [35] Manuel Rigger and Zhendong Su. 2020. Finding bugs in database systems via
[11] DBEngines. 2024. https://db-engines.com/. query partitioning. Proc. ACM Program. Lang. 4, OOPSLA, Article 211 (nov 2020),
[12] Jingzhou Fu, Jie Liang, Zhiyong Wu, Mingzhe Wang, and Yu Jiang. 2023. Griffin: 30 pages. https://doi.org/10.1145/3428279
Grammar-Free DBMS Fuzzing. In Proceedings of the 37th IEEE/ACM International [36] Manuel Rigger and Zhendong Su. 2020. Testing database engines via pivoted
Conference on Automated Software Engineering (Rochester, MI, USA) (ASE ’22). query synthesis. In Proceedings of the 14th USENIX Conference on Operating
Association for Computing Machinery, New York, NY, USA, Article 49, 12 pages. Systems Design and Implementation (OSDI’20). USENIX Association, USA, Article
https://doi.org/10.1145/3551349.3560431 38, 16 pages.
[13] Ying Fu, Zhiyong Wu, Yuanliang Zhang, Jie Liang, Jingzhou Fu, Yu Jiang, Shan- [37] RisingWave. 2024. https://www.risingwave.com/.
shan Li, and Xiangke Liao. 2025. THANOS: DBMS Bug Detection via Storage [38] Donald R. Slutz. 1998. Massive Stochastic Testing of SQL. In Proceedings of
Engine Rotation Based Differential Testing. In Proceedings of the IEEE/ACM 47th the 24rd International Conference on Very Large Data Bases (VLDB ’98). Morgan
International Conference on Software Engineering. 1–12. Kaufmann Publishers Inc., San Francisco, CA, USA, 618–622.
[14] Paolo Guagliardo and Leonid Libkin. 2017. Correctness of SQL queries on [39] Jiansen Song, Wensheng Dou, Ziyu Cui, Qianwang Dai, Wei Wang, Jun Wei,
databases with nulls. ACM SIGMOD Record 46, 3 (2017), 5–16. Hua Zhong, and Tao Huang. 2023. Testing Database Systems via Differential
[15] Paolo Guagliardo and Leonid Libkin. 2017. A formal semantics of SQL queries, Query Execution. In 2023 IEEE/ACM 45th International Conference on Software
its validation, and applications. Proceedings of the VLDB Endowment (PVLDB) 11, Engineering (ICSE). 2072–2084. https://doi.org/10.1109/ICSE48619.2023.00175
1 (2017), 27–39. [40] SQLite. 2024. https://www.sqlite.org/.
[16] Ziyue Hua, Wei Lin, Luyao Ren, Zongyang Li, Lu Zhang, Wenpin Jiao, and Tao Xie. [41] TDEngine. 2024. https://tdengine.com/.
2023. GDsmith: Detecting Bugs in Cypher Graph Database Engines. In Proceedings [42] Timescale. 2024. https://www.timescale.com/.
of the 32nd ACM SIGSOFT International Symposium on Software Testing and [43] Wikipedia. 2024. Gartner hype cycle. https://en.wikipedia.org/wiki/Gartner_
Analysis (Seattle, WA, USA) (ISSTA 2023). Association for Computing Machinery, hype_cycle.
New York, NY, USA, 163–174. https://doi.org/10.1145/3597926.3598046 [44] Zhiyong Wu, Jie Liang, Mingzhe Wang, Chijin Zhou, and Yu Jiang. 2022. Unicorn:
[17] Laura Inozemtseva and Reid Holmes. 2014. Coverage is not strongly correlated detect runtime errors in time-series databases with hybrid input synthesis. In
with test suite effectiveness. In Proceedings of the 36th international conference on Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing
software engineering. 435–445. and Analysis (Virtual, South Korea) (ISSTA 2022). Association for Computing Ma-
[18] Jacoco. 2024. https://www.jacoco.org/. chinery, New York, NY, USA, 251–262. https://doi.org/10.1145/3533767.3534364
[19] Yuancheng Jiang, Jiahao Liu, Jinsheng Ba, Roland H. C. Yap, Zhenkai Liang, and [45] Rui Yang, Yingying Zheng, Lei Tang, Wensheng Dou, Wei Wang, and Jun Wei.
Manuel Rigger. 2024. Detecting Logic Bugs in Graph Database Management 2023. Randomized Differential Testing of RDF Stores. In 2023 IEEE/ACM 45th
Systems via Injective and Surjective Graph Query Transformation. In Proceedings International Conference on Software Engineering: Companion Proceedings (ICSE-
of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Companion). 136–140. https://doi.org/10.1109/ICSE-Companion58688.2023.00041
Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, [46] Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and under-
Article 46, 12 pages. https://doi.org/10.1145/3597503.3623307 standing bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference
[20] Zu-Ming Jiang, Jia-Ju Bai, and Zhendong Su. 2023. DynSQL: Stateful fuzzing for on Programming Language Design and Implementation (San Jose, California, USA)
database management systems with complex and valid SQL query generation. (PLDI ’11). Association for Computing Machinery, New York, NY, USA, 283–294.
In Proceedings of the 32nd USENIX Conference on Security Symposium (Anaheim, https://doi.org/10.1145/1993498.1993532
CA, USA) (SEC ’23). USENIX Association, USA, Article 277, 17 pages. [47] A. Zeller and R. Hildebrandt. 2002. Simplifying and isolating failure-inducing
[21] Jinho Jung, Hong Hu, Joy Arulraj, Taesoo Kim, and Woonhak Kang. 2019. input. IEEE Transactions on Software Engineering 28, 2 (2002), 183–200. https:
APOLLO: automatic detection and diagnosis of performance regressions in data- //doi.org/10.1109/32.988498
base systems. Proc. VLDB Endow. 13, 1 (sep 2019), 57–70. https://doi.org/10. [48] Yingying Zheng, Wensheng Dou, Yicheng Wang, Zheng Qin, Lei Tang, Yu Gao,
14778/3357377.3357382 Dong Wang, Wei Wang, and Jun Wei. 2022. Finding bugs in Gremlin-based graph
[22] Matteo Kamm, Manuel Rigger, Chengyu Zhang, and Zhendong Su. 2023. Testing database systems via Randomized differential testing. In Proceedings of the 31st
Graph Database Engines via Query Partitioning. In Proceedings of the 32nd ACM ACM SIGSOFT International Symposium on Software Testing and Analysis (Virtual,
SIGSOFT International Symposium on Software Testing and Analysis (Seattle, WA, South Korea) (ISSTA 2022). Association for Computing Machinery, New York, NY,
USA) (ISSTA 2023). Association for Computing Machinery, New York, NY, USA, USA, 302–313. https://doi.org/10.1145/3533767.3534409
140–149. https://doi.org/10.1145/3597926.3598044 [49] Rui Zhong, Yongheng Chen, Hong Hu, Hangfan Zhang, Wenke Lee, and Ding-
[23] George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. hao Wu. 2020. SQUIRREL: Testing Database Management Systems with Lan-
Evaluating Fuzz Testing. In Proceedings of the 2018 ACM SIGSAC Conference on guage Validity and Coverage Feedback. In Proceedings of the 2020 ACM SIGSAC
Computer and Communications Security (Toronto, Canada) (CCS ’18). Association Conference on Computer and Communications Security (Virtual Event, USA)
for Computing Machinery, New York, NY, USA, 2123–2138. https://doi.org/10. (CCS ’20). Association for Computing Machinery, New York, NY, USA, 955–970.
Conf ’25, June 25–28, 2025, Trondheim, Norway Yuancheng Jiang, Jianing Wang, Chuqi Zhang, Roland Yap, Zhenkai Liang, Manuel Rigger

https://doi.org/10.1145/3372297.3417260 VLDB Endow. 17, 4 (mar 2024), 836–848. https://doi.org/10.14778/3636218.3636236


[50] Zeyang Zhuang, Penghui Li, Pingchuan Ma, Wei Meng, and Shuai Wang. 2024.
Testing Graph Database Systems via Graph-Aware Metamorphic Relations. Proc.

You might also like