Duckdb Docs
Duckdb Docs
Contents i
Summary 1
Documentation 3
Connect 5
Connect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Data Import 7
Importing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
CSV Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
CSV Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
CSV Auto Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Reading Faulty CSV Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
CSV Import Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
JSON Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
JSON Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Multiple Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Reading Multiple Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Combining Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Parquet Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Reading and Writing Parquet Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Querying Parquet Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Parquet Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Parquet Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Hive Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Partitioned Writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Appender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
INSERT Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Client APIs 39
Client APIs Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Startup & Shutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Data Chunks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Prepared Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Appender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Table Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
i
DuckDB Documentation
Configuration 301
Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
Pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Secrets Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
SQL 313
SQL Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
Statements Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
ii
DuckDB Documentation
iii
DuckDB Documentation
Extensions 533
Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
Official Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
iv
DuckDB Documentation
Guides 599
Performance 615
Performance Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
v
DuckDB Documentation
Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
Tuning Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
My Workload Is Slow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
ODBC 635
ODBC 101: A Duck Themed Guide to ODBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
Python 643
Installing the Python Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
Executing SQL in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
Jupyter Notebooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
SQL on Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
Import from Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
Export to Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
SQL on Apache Arrow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
Import from Apache Arrow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
Export to Apache Arrow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
Relational API on Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
Multiple Python Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
Integration with Ibis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
Integration with Polars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
Using fsspec Filesystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664
Internals 683
Overview of DuckDB Internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
Execution Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687
vi
DuckDB Documentation
Acknowledgments 709
vii
Summary
This document contains DuckDB's official documentation and guides in a single‑file easy‑to‑search form. If you find any issues, please
report them as a GitHub issue. Contributions are very welcome in the form of pull requests. If you are considering submitting a contribution
to the documentation, please consult our contributor guide.
Code repositories:
1
DuckDB Documentation
2
DuckDB Documentation
Documentation
3
Connect
Connect
To use DuckDB, you must first create a connection to a database. The exact syntax varies between the client APIs but it typically involves
passing an argument to configure persistence.
Persistence
DuckDB can operate in both persistent mode, where the data is saved to disk, and in in‑memory mode, where the entire data set is stored
in the main memory.
Persistent Database To create or open a persistent database, set the path of the database file, e.g., my_database.duckdb, when
creating the connection. This path can point to an existing database or to a file that does not yet exist and DuckDB will open or create a
database at that location as needed. The file may have an arbitrary extension, but .db or .duckdb are two common choices.
Note. Tip Running on a persistent database allows spilling to disk, thus facilitating larger‑than‑memory workloads (i.e., out‑of‑core‑
processing).
Starting with v0.10, DuckDB's storage format is backwards‑compatible, i.e., DuckDB is able to read database files produced by an older
versions of DuckDB. For example, DuckDB v0.10 can read and operate on files created by the previous DuckDB version, v0.9. For more
details on DuckDB's storage format, see the storage page.
In‑Memory Database DuckDB can operate in in‑memory mode. In most clients, this can be activated by passing the special value :mem-
ory: as the database file or omitting the database file argument. In in‑memory mode, no data is persisted to disk, therefore, all data is
lost when the process finishes.
Concurrency
Handling Concurrency
When using option 1, DuckDB supports multiple writer threads using a combination of MVCC (Multi‑Version Concurrency Control) and
optimistic concurrency control (see Concurrency within a Single Process), but all within that single writer process. The reason for this con‑
currency model is to allow for the caching of data in RAM for faster analytical queries, rather than going back and forth to disk during each
query. It also allows the caching of functions pointers, the database catalog, and other items so that subsequent queries on the same
connection are faster.
5
DuckDB Documentation
Note. DuckDB is optimized for bulk operations, so executing many small transactions is not a primary design goal.
DuckDB supports concurrency within a single process according to the following rules. As long as there are no write conflicts, multiple
concurrent writes will succeed. Appends will never conflict, even on the same table. Multiple threads can also simultaneously update
separate tables or separate subsets of the same table. Optimistic concurrency control comes into play when two threads attempt to edit
(update or delete) the same row at the same time. In that situation, the second thread to attempt the edit will fail with a conflict error.
Writing to DuckDB from multiple processes is not supported automatically and is not a primary design goal (see Handling Concurrency).
If multiple processes must write to the same file, several design patterns are possible, but would need to be implemented in application
logic. For example, each process could acquire a cross‑process mutex lock, then open the database in read/write mode and close it when the
query is complete. Instead of using a mutex lock, each process could instead retry the connection if another process is already connected to
the database (being sure to close the connection upon query completion). Another alternative would be to do multi‑process transactions
on a MySQL, PostgreSQL, or SQLite database, and use DuckDB's MySQL, PostgreSQL, or SQLite extensions to execute analytical queries on
that data periodically.
Additional options include writing data to Parquet files and using DuckDB's ability to read multiple Parquet files, taking a similar approach
with CSV files, or creating a web server to receive requests and manage reads and writes to DuckDB.
6
Data Import
Importing Data
The first step to using a database system is to insert data into that system. DuckDB provides several data ingestion methods that allow you
to easily and efficiently fill up the database. In this section, we provide an overview of these methods so you can select which one is correct
for you.
Insert Statements
Insert statements are the standard way of loading data into a database system. They are suitable for quick prototyping, but should be
avoided for bulk loading as they have significant per‑row overhead.
For a more detailed description, see the page on the INSERT statement.
CSV Loading
Data can be efficiently loaded from CSV files using the read_csv function or the COPY statement.
You can also load data from compressed (e.g., compressed with gzip) CSV files, for example:
Parquet Loading
Parquet files can be efficiently loaded and queried using the read_parquet function.
JSON Loading
JSON files can be efficiently loaded and queried using the read_json_auto function.
Appender
In several APIs (C, C++, Go, Java, and Rust), the Appender can be used as an alternative for bulk data loading. This class can be used to
efficiently add rows to the database system without using SQL statements.
7
DuckDB Documentation
CSV Files
CSV Import
Examples
CSV Loading
CSV loading, i.e., importing CSV files to the database, is a very common, and yet surprisingly tricky, task. While CSVs seem simple on the
surface, there are a lot of inconsistencies found within CSV files that can make loading them a challenge. CSV files come in many different
varieties, are often corrupt, and do not have a schema. The CSV reader needs to cope with all of these different situations.
The DuckDB CSV reader can automatically infer which configuration flags to use by analyzing the CSV file using the CSV sniffer. This will
work correctly in most situations, and should be the first option attempted. In rare situations where the CSV reader cannot figure out the
correct configuration it is possible to manually configure the CSV reader to correctly parse the CSV file. See the auto detection page for
more information.
Parameters
Below are parameters that can be passed to the CSV reader. These parameters are accepted by both the COPY statement and the read_
csv function.
8
DuckDB Documentation
all_varchar Option to skip type detection for CSV parsing and assume BOOL false
all columns to be of type VARCHAR.
allow_quoted_nulls Option to allow the conversion of quoted values to NULL BOOL true
values
auto_detect Enables auto detection of CSV parameters. BOOL true
auto_type_candidates This option allows you to specify the types that the sniffer TYPE[] ['SQLNULL',
will use when detecting CSV column types, e.g., SELECT 'BOOLEAN',
* FROM read_csv('csv_file.csv', auto_ 'BIGINT',
type_candidates=['BIGINT', 'DATE']). The 'DOUBLE',
VARCHAR type is always included in the detected types (as 'TIME',
a fallback option). 'DATE',
'TIMESTAMP',
'VARCHAR']
columns A struct that specifies the column names and column types STRUCT (empty)
contained within the CSV file (e.g., {'col1':
'INTEGER', 'col2': 'VARCHAR'}). Using this
option implies that auto detection is not used.
compression The compression type for the file. By default this will be VARCHAR auto
detected automatically from the file extension (e.g.,
t.csv.gz will use gzip, t.csv will use none). Options
are none, gzip, zstd.
dateformat Specifies the date format to use when parsing dates. See VARCHAR (empty)
Date Format.
decimal_separator The decimal separator of numbers. VARCHAR .
delim or sep Specifies the string that separates columns within each VARCHAR ,
row (line) of the file.
escape Specifies the string that should appear before a data VARCHAR "
character sequence that matches the quote value.
filename Whether or not an extra filename column should be BOOL false
included in the result.
force_not_null Do not match the specified columns' values against the VARCHAR[] []
NULL string. In the default case where the NULL string is
empty, this means that empty values will be read as
zero‑length strings rather than NULLs.
header Specifies that the file contains a header line with the BOOL false
names of each column in the file.
hive_partitioning Whether or not to interpret the path as a Hive partitioned BOOL false
path.
ignore_errors Option to ignore any parsing errors encountered ‑ and BOOL false
instead ignore rows with errors.
max_line_size The maximum line size in bytes. BIGINT 2097152
names The column names as a list, see example. VARCHAR[] (empty)
new_line Set the new line character(s) in the file. Options are VARCHAR (empty)
'\r','\n', or '\r\n'.
9
DuckDB Documentation
normalize_names Boolean value that specifies whether or not column names BOOL false
should be normalized, removing any non‑alphanumeric
characters from them.
null_padding If this option is enabled, when a row lacks columns, it will BOOL false
pad the remaining columns on the right with null values.
nullstr Specifies the string that represents a NULL value. VARCHAR (empty)
parallel Whether or not the parallel CSV reader is used. BOOL true
quote Specifies the quoting string to be used when a data value is VARCHAR "
quoted.
sample_size The number of sample rows for auto detection of BIGINT 20480
parameters.
skip The number of lines at the top of the file to skip. BIGINT 0
timestampformat Specifies the date format to use when parsing timestamps. VARCHAR (empty)
See Date Format
types or dtypes The column types as either a list (by position) or a struct VARCHAR[] or (empty)
(by name). Example here. STRUCT
union_by_name Whether the columns of multiple schemas should be BOOL false
unified by name, rather than by position.
CSV Functions
Note. Deprecated DuckDB v0.10.0 introduced breaking changes to the read_csv function. Namely, The read_csv function now
attempts auto‑detecting the CSV parameters, making its behavior identical to the old read_csv_auto function. If you would like to
use read_csv with its old behavior, turn off the auto‑detection manually by using read_csv(..., auto_detect = false).
The read_csv automatically attempts to figure out the correct configuration of the CSV reader using the CSV sniffer. It also automatically
deduces types of columns. If the CSV file has a header, it will use the names found in that header to name the columns. Otherwise, the
columns will be named column0, column1, column2, .... An example with the flights.csv file:
The path can either be a relative path (relative to the current working directory) or an absolute path.
10
DuckDB Documentation
If we set delim/sep, quote, escape, or header explicitly, we can bypass the automatic detection of this particular parameter:
Multiple files can be read at once by providing a glob or a list of files. Refer to the multiple files section for more information.
The COPY statement can be used to load data from a CSV file into a table. This statement has the same syntax as the one used in PostgreSQL.
To load the data using the COPY statement, we must first create a table with the correct schema (which matches the order of the columns
in the CSV file and uses types that fit the values in the CSV file). COPY detects the CSV's configuration options automatically.
If we want to manually specify the CSV format, we can do so using the configuration options of COPY.
CREATE TABLE ontime (flightdate DATE, uniquecarrier VARCHAR, origincityname VARCHAR, destcityname
VARCHAR);
COPY ontime FROM 'flights.csv' (DELIMITER '|', HEADER);
SELECT * FROM ontime;
DuckDB supports reading erroneous CSV files. For details, see the Reading Faulty CSV Files page.
Limitations
The CSV reader only supports input files using UTF‑8 character encoding. For CSV files using different encodings, use e.g. the iconv
command‑line tool to convert them to UTF‑8.
11
DuckDB Documentation
When using read_csv, the system tries to automatically infer how to read the CSV file using the CSV sniffer. This step is necessary because
CSV files are not self‑describing and come in many different dialects. The auto‑detection works roughly as follows:
• Detect the dialect of the CSV file (delimiter, quoting rule, escape)
• Detect the types of each of the columns
• Detect whether or not the file has a header row
By default the system will try to auto‑detect all options. However, options can be individually overridden by the user. This can be useful in
case the system makes a mistake. For example, if the delimiter is chosen incorrectly, we can override it by calling the read_csv with an
explicit delimiter (e.g., read_csv('file.csv', delim = '|')).
The detection works by operating on a sample of the file. The size of the sample can be modified by setting the sample_size parameter.
The default sample size is 20480 rows. Setting the sample_size parameter to -1 means the entire file is read for sampling. The way
sampling is performed depends on the type of file. If we are reading from a regular file on disk, we will jump into the file and try to sample
from different locations in the file. If we are reading from a file in which we cannot jump ‑ such as a .gz compressed CSV file or stdin ‑
samples are taken only from the beginning of the file.
sniff_csv Function
It is possible to run the CSV sniffer as a separate step using the sniff_csv(filename) function, which returns the detected CSV prop‑
erties as a table with a single row. The sniff_csv function accepts an optional sample_size parameter to configure the number of
rows sampled.
FROM sniff_csv('my_file.csv');
FROM sniff_csv('my_file.csv', sample_size = 1000);
Delimiter delimiter ,
Quote quote character "
Escape escape \
NewLineDelimiter new‑line delimiter \r\n
SkipRow number of rows skipped 1
HasHeader whether the CSV has a header true
Columns column types encoded as a LIST of ({'name': 'VARCHAR', 'age': 'BIGINT'})
STRUCTs
DateFormat date Format %d/%m/%Y
TimestampFormat timestamp Format %Y-%m-%dT%H:%M:%S.%f
UserArguments arguments used to invoke sniff_csv sample_size = 1000
Prompt prompt ready to be used to read the CSV FROM read_csv('my_file.csv', auto_
detect=false, delim=',', ...)
Prompt The Prompt column contains a SQL command with the configurations detected by the sniffer.
12
DuckDB Documentation
Detection Steps
Dialect Detection Dialect detection works by attempting to parse the samples using the set of considered values. The detected dialect
is the dialect that has (1) a consistent number of columns for each row, and (2) the highest number of columns for each row.
delim , | ; \t
quote " ' (empty)
escape " ' \ (empty)
FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-01|AA|New York, NY|Los Angeles, CA
1988-01-02|AA|New York, NY|Los Angeles, CA
1988-01-03|AA|New York, NY|Los Angeles, CA
In this example ‑ the system selects the | as the delimiter. All rows are split into the same amount of columns, and there is more than one
column per row meaning the delimiter was actually found in the CSV file.
Type Detection After detecting the dialect, the system will attempt to figure out the types of each of the columns. Note that this step
is only performed if we are calling read_csv. In case of the COPY statement the types of the table that we are copying into will be used
instead.
The type detection works by attempting to convert the values in each column to the candidate types. If the conversion is unsuccessful, the
candidate type is removed from the set of candidate types for that column. After all samples have been handled ‑ the remaining candidate
type with the highest priority is chosen. The set of considered candidate types in order of priority is given below:
Types
BOOLEAN
BIGINT
DOUBLE
TIME
DATE
TIMESTAMP
VARCHAR
Note everything can be cast to VARCHAR. This type has the lowest priority ‑ i.e., columns are converted to VARCHAR if they cannot be cast
to anything else. In flights.csv the FlightDate column will be cast to a DATE, while the other columns will be cast to VARCHAR.
13
DuckDB Documentation
The detected types can be individually overridden using the types option. This option takes either a list of types (e.g., types=[INT,
VARCHAR, DATE]) which overrides the types of the columns in‑order of occurrence in the CSV file. Alternatively, types takes a name
-> type map which overrides options of individual columns (e.g., types={'quarter': INT}).
The type detection can be entirely disabled by using the all_varchar option. If this is set all columns will remain as VARCHAR (as they
originally occur in the CSV file).
Header Detection
Header detection works by checking if the candidate header row deviates from the other rows in the file in terms of types. For example,
in flights.csv, we can see that the header row consists of only VARCHAR columns ‑ whereas the values contain a DATE value for the
FlightDate column. As such ‑ the system defines the first row as the header row and extracts the column names from the header row.
In files that do not have a header row, the column names are generated as column0, column1, etc.
Note that headers cannot be detected correctly if all columns are of type VARCHAR ‑ as in this case the system cannot distinguish the header
row from the other rows in the file. In this case the system assumes the file has no header. This can be overridden using the header
option.
Dates and Timestamps DuckDB supports the ISO 8601 format format by default for timestamps, dates and times. Unfortunately, not all
dates and times are formatted using this standard. For that reason, the CSV reader also supports the dateformat and timestampfor-
mat options. Using this format the user can specify a format string that specifies how the date or timestamp should be read.
As part of the auto‑detection, the system tries to figure out if dates and times are stored in a different representation. This is not always
possible ‑ as there are ambiguities in the representation. For example, the date 01-02-2000 can be parsed as either January 2nd or
February 1st. Often these ambiguities can be resolved. For example, if we later encounter the date 21-02-2000 then we know that the
format must have been DD-MM-YYYY. MM-DD-YYYY is no longer possible as there is no 21nd month.
If the ambiguities cannot be resolved by looking at the data the system has a list of preferences for which date format to use. If the system
choses incorrectly, the user can specify the dateformat and timestampformat options manually.
The system considers the following formats for dates (dateformat). Higher entries are chosen over lower entries in case of ambiguities
(i.e., ISO 8601 is preferred over MM-DD-YYYY).
dateformat
ISO 8601
%y-%m-%d
%Y-%m-%d
%d-%m-%y
%d-%m-%Y
%m-%d-%y
%m-%d-%Y
The system considers the following formats for timestamps (timestampformat). Higher entries are chosen over lower entries in case of
ambiguities.
timestampformat
ISO 8601
%y-%m-%d %H:%M:%S
%Y-%m-%d %H:%M:%S
14
DuckDB Documentation
timestampformat
%d-%m-%y %H:%M:%S
%d-%m-%Y %H:%M:%S
%m-%d-%y %I:%M:%S %p
%m-%d-%Y %I:%M:%S %p
%Y-%m-%d %H:%M:%S.%f
Reading erroneous CSV files is possible by utilizing the ignore_errors option. With that option set, rows containing data that would
otherwise cause the CSV Parser to generate an error will be ignored.
Pedro,31
Oogie Boogie, three
If you read the CSV file, specifying that the first column is a VARCHAR and the second column is an INTEGER, loading the file would fail, as
the string three cannot be converted to an INTEGER.
However, with ignore_errors set, the second row of the file is skipped, outputting only the complete first row. For example:
FROM read_csv(
'faulty.csv',
columns = {'name': 'VARCHAR', 'age': 'INTEGER'},
ignore_errors = true
);
Outputs:
name age
Pedro 31
One should note that the CSV Parser is affected by the projection pushdown optimization. Hence, if we were to select only the name column,
both rows would be considered valid, as the casting error on the age would never occur. For example:
SELECT name
FROM read_csv('faulty.csv', columns = {'name': 'VARCHAR', 'age': 'INTEGER'});
Outputs:
name
Pedro
Oogie Boogie
15
DuckDB Documentation
Being able to read faulty CSV files is important, but for many data cleaning operations, it is also necessary to know exactly which lines are
corrupted and what errors the parser discovered on them. For scenarios like these, it is possible to use DuckDB's CSV Rejects Table feature.
It is important to note that the Rejects Table can only be used when ignore_errors is set, and currently, only stores casting errors and
does not save errors when the number of columns differ.
Parameters
The parameters listed below are used in the read_csv function to configure the CSV Rejects Table.
rejects_table Name of a temporary table where the information of the VARCHAR (empty)
faulty lines of a CSV file are stored.
rejects_limit Upper limit on the number of faulty records from a CSV file BIGINT 0
that will be recorded in the rejects table. 0 is used when no
limit should be applied.
rejects_recovery_ Column values that serve as a primary key to the CSV file. VARCHAR[] (empty)
columns The are stored in the CSV Rejects Table to help identify the
faulty tuples.
To store the information of the faulty CSV lines in a rejects table, the user must simply provide the rejects table name in therejects_
table option. For example:
FROM read_csv(
'faulty.csv',
columns = {'name': 'VARCHAR', 'age': 'INTEGER'},
rejects_table = 'rejects_table',
ignore_errors = true
);
You can then query the rejects_table table, to retrieve information about the rejected tuples. For example:
FROM rejects_table;
Outputs:
16
DuckDB Documentation
faulty.csv 2 1 age three Could not convert string ' three' to 'INTEGER'
Additionally, the name column could also be provided as a primary key via the rejects_recovery_columns option to provide more
information over the faulty lines. For example:
FROM read_csv(
'faulty.csv',
columns = {'name': 'VARCHAR', 'age': 'INTEGER'},
rejects_table = 'rejects_table',
rejects_recovery_columns = '[name]',
ignore_errors = true
);
column_ parsed_
file line column name value recovery_columns error
faulty.csv 2 1 age three {'name': 'Oogie Could not convert string ' three' to 'INTEGER'
Boogie'}
Below is a collection of tips to help when attempting to import complex CSV files. In the examples, we use the flights.csv file.
Override the Header Flag if the Header Is Not Correctly Detected If a file contains only string columns the header auto‑detection
might fail. Provide the header option to override this behavior.
Provide Names if the File Does Not Contain a Header If the file does not contain a header, names will be auto‑generated by default.
You can provide your own names with the names option.
Override the Types of Specific Columns The types flag can be used to override types of only certain columns by providing a struct of
name -> type mappings.
Use COPY When Loading Data into a Table The COPY statement copies data directly into a table. The CSV reader uses the schema of
the table instead of auto‑detecting types from the file. This speeds up the auto‑detection, and prevents mistakes from being made during
auto‑detection.
Use union_by_name When Loading Files with Different Schemas The union_by_name option can be used to unify the schema of
files that have different or missing columns. For files that do not have certain columns, NULL values are filled in.
17
DuckDB Documentation
JSON Files
JSON Loading
Examples
JSON Loading
JSON is an open standard file format and data interchange format that uses human‑readable text to store and transmit data objects con‑
sisting of attribute–value pairs and arrays (or other serializable values). While it is not a very efficient format for tabular data, it is very
commonly used, especially as a data interchange format.
The DuckDB JSON reader can automatically infer which configuration flags to use by analyzing the JSON file. This will work correctly in most
situations, and should be the first option attempted. In rare situations where the JSON reader cannot figure out the correct configuration,
it is possible to manually configure the JSON reader to correctly parse the JSON file.
Parameters
auto_detect Whether to auto‑detect detect the names of the keys and BOOL false
data types of the values automatically
columns A struct that specifies the key names and value types STRUCT (empty)
contained within the JSON file (e.g., {key1:
'INTEGER', key2: 'VARCHAR'}). If auto_detect
is enabled these will be inferred
18
DuckDB Documentation
compression The compression type for the file. By default this will be VARCHAR 'auto'
detected automatically from the file extension (e.g.,
t.json.gz will use gzip, t.json will use none). Options
are 'none', 'gzip', 'zstd', and 'auto'.
convert_strings_to_ Whether strings representing integer values should be BOOL false
integers converted to a numerical type.
dateformat Specifies the date format to use when parsing dates. See VARCHAR 'iso'
Date Format
filename Whether or not an extra filename column should be BOOL false
included in the result.
format Can be one of ['auto', 'unstructured', VARCHAR 'array'
'newline_delimited', 'array']
hive_partitioning Whether or not to interpret the path as a Hive partitioned BOOL false
path.
ignore_errors Whether to ignore parse errors (only possible when BOOL false
format is 'newline_delimited')
maximum_depth Maximum nesting depth to which the automatic schema BIGINT -1
detection detects types. Set to ‑1 to fully detect nested
JSON types
maximum_object_size The maximum size of a JSON object (in bytes) UINTEGER 16777216
records Can be one of ['auto', 'true', 'false'] VARCHAR 'records'
sample_size Option to define number of sample objects for automatic UBIGINT 20480
JSON type detection. Set to ‑1 to scan the entire input file
timestampformat Specifies the date format to use when parsing timestamps. VARCHAR 'iso'
See Date Format
union_by_name Whether the schema's of multiple JSON files should be BOOL false
unified.
The JSON extension can attempt to determine the format of a JSON file when setting format to auto. Here are some example JSON files
and the corresponding format settings that should be used.
In each of the below cases, the format setting was not needed, as DuckDB was able to infer it correctly, but it is included for illustrative
purposes. A query of this shape would work in each case:
SELECT *
FROM filename.json;
Format: newline_delimited With format = 'newline_delimited' newline‑delimited JSON can be parsed. Each line is a
JSON.
19
DuckDB Documentation
SELECT *
FROM read_json_auto('records.json', format = 'newline_delimited');
key1 key2
value1 value1
value2 value2
value3 value3
Format: array If the JSON file contains a JSON array of objects (pretty‑printed or not), array_of_objects may be used.
[
{"key1":"value1", "key2": "value1"},
{"key1":"value2", "key2": "value2"},
{"key1":"value3", "key2": "value3"}
]
SELECT *
FROM read_json_auto('array.json', format = 'array');
key1 key2
value1 value1
value2 value2
value3 value3
Format: unstructured If the JSON file contains JSON that is not newline‑delimited or an array, unstructured may be used.
{
"key1":"value1",
"key2":"value1"
}
{
"key1":"value2",
"key2":"value2"
}
{
"key1":"value3",
"key2":"value3"
}
SELECT *
FROM read_json_auto('unstructured.json', format = 'unstructured');
key1 key2
value1 value1
value2 value2
value3 value3
20
DuckDB Documentation
The JSON extension can attempt to determine whether a JSON file contains records when setting records = auto. When records =
true, the JSON extension expects JSON objects, and will unpack the fields of JSON objects into individual columns.
SELECT *
FROM read_json_auto('records.json', records = true);
key1 key2
value1 value1
value2 value2
value3 value3
When records = false, the JSON extension will not unpack the top‑level objects, and create STRUCTs instead:
SELECT *
FROM read_json_auto('records.json', records = false);
json
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
SELECT *
FROM read_json_auto('arrays.json', records = false);
json
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
Writing
The contents of tables or the result of queries can be written directly to a JSON file using the COPY statement. See the COPY documentation
for more information.
21
DuckDB Documentation
read_json_auto Function
The read_json_auto is the simplest method of loading JSON files: it automatically attempts to figure out the correct configuration of
the JSON reader. It also automatically deduces types of columns.
SELECT *
FROM read_json_auto('todos.json')
LIMIT 5;
The path can either be a relative path (relative to the current working directory) or an absolute path.
If we specify the columns, we can bypass the automatic detection. Note that not all columns need to be specified:
SELECT *
FROM read_json_auto('todos.json',
columns = {userId: 'UBIGINT',
completed: 'BOOLEAN'});
Multiple files can be read at once by providing a glob or a list of files. Refer to the multiple files section for more information.
COPY Statement
The COPY statement can be used to load data from a JSON file into a table. For the COPY statement, we must first create a table with the
correct schema to load the data into. We then specify the JSON file to load from plus any configuration options separately.
CREATE TABLE todos (userId UBIGINT, id UBIGINT, title VARCHAR, completed BOOLEAN);
COPY todos FROM 'todos.json';
SELECT * FROM todos LIMIT 5;
22
DuckDB Documentation
Multiple Files
DuckDB can read multiple files of different types (CSV, Parquet, JSON files) at the same time using either the glob syntax, or by providing a
list of files to read. See the combining schemas page for tips on reading files with different schemas.
CSV
-- read all files with a name ending in ".csv" in the folder "dir"
SELECT * FROM 'dir/*.csv';
-- read all files with a name ending in ".csv", two directories deep
SELECT * FROM '*/*/*.csv';
-- read all files with a name ending in ".csv", at any depth in the folder "dir"
SELECT * FROM 'dir/**/*.csv';
-- read the CSV files 'flights1.csv' and 'flights2.csv'
SELECT * FROM read_csv(['flights1.csv', 'flights2.csv']);
-- read the CSV files 'flights1.csv' and 'flights2.csv', unifying schemas by name and outputting a
`filename` column
SELECT * FROM read_csv(['flights1.csv', 'flights2.csv'], union_by_name = true, filename = true);
Parquet
DuckDB can also read a series of Parquet files and treat them as if they were a single table. Note that this only works if the Parquet files
have the same schema. You can specify which Parquet files you want to read using a list parameter, glob pattern matching syntax, or a
combination of both.
23
DuckDB Documentation
List Parameter The read_parquet function can accept a list of filenames as the input parameter.
Glob Syntax Any file name input to the read_parquet function can either be an exact filename, or use a glob syntax to read multiple files
that match a pattern.
Wildcard Description
Note that the ? wildcard in globs is not supported for reads over S3 due to HTTP encoding issues.
Here is an example that reads all the files that end with .parquet located in the test folder:
List of Globs The glob syntax and the list input parameter can be combined to scan files that meet one of multiple patterns.
DuckDB can read multiple CSV files at the same time using either the glob syntax, or by providing a list of files to read.
Filename
The filename argument can be used to add an extra filename column to the result that indicates which row came from which file. For
example:
The glob pattern matching syntax can also be used to search for filenames using the glob table function. It accepts one parameter: the
path to search (which may include glob patterns).
24
DuckDB Documentation
file
duckdb.exe
test.csv
test.json
test.parquet
test2.csv
test2.parquet
todos.json
Combining Schemas
Examples
Combining Schemas
When reading from multiple files, we have to combine schemas from those files. That is because each file has its own schema that can
differ from the other files. DuckDB offers two ways of unifying schemas of multiple files: by column position and by column name.
By default, DuckDB reads the schema of the first file provided, and then unifies columns in subsequent files by column position. This works
correctly as long as all files have the same schema. If the schema of the files differs, you might want to use the union_by_name option
to allow DuckDB to construct the schema by reading all of the names instead.
Union by Position
By default, DuckDB unifies the columns of these different files by position. This means that the first column in each file is combined
together, as well as the second column in each file, etc. For example, consider the following two files.
flights1.csv:
FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-01|AA|New York, NY|Los Angeles, CA
1988-01-02|AA|New York, NY|Los Angeles, CA
flights2.csv:
FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-03|AA|New York, NY|Los Angeles, CA
Reading the two files at the same time will produce the following result set:
25
DuckDB Documentation
Union by Name
If you are processing multiple files that have different schemas, perhaps because columns have been added or renamed, it might be de‑
sirable to unify the columns of different files by name instead. This can be done by providing the union_by_name option. For example,
consider the following two files, where flights4.csv has an extra column (UniqueCarrier).
flights3.csv:
FlightDate|OriginCityName|DestCityName
1988-01-01|New York, NY|Los Angeles, CA
1988-01-02|New York, NY|Los Angeles, CA
flights4.csv:
FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-03|AA|New York, NY|Los Angeles, CA
Reading these when unifying column names by position results in an error ‑ as the two files have a different number of columns. When
specifying the union_by_name option, the columns are correctly unified, and any missing values are set to NULL.
Parquet Files
Examples
26
DuckDB Documentation
-- write the results of a query to a Parquet file using the default compression (Snappy)
COPY
(SELECT * FROM tbl)
TO 'result-snappy.parquet'
(FORMAT 'parquet');
-- write the results from a query to a Parquet file with specific compression and row group size
COPY
(FROM generate_series(100_000))
TO 'test.parquet'
(FORMAT 'parquet', COMPRESSION 'zstd', ROW_GROUP_SIZE 100_000);
Parquet Files
Parquet files are compressed columnar files that are efficient to load and process. DuckDB provides support for both reading and writing
Parquet files in an efficient manner, as well as support for pushing filters and projections into the Parquet file scans.
Note. Parquet data sets differ based on the number of files, the size of individual files, the compression algorithm used row group
size, etc. These have a significant effect on performance. Please consult the Performance Guide for details.
read_parquet Function
If your file ends in .parquet, the function syntax is optional. The system will automatically infer that you are reading a Parquet file.
Multiple files can be read at once by providing a glob or a list of files. Refer to the multiple files section for more information.
Parameters There are a number of options exposed that can be passed to the read_parquet function or the COPY statement.
27
DuckDB Documentation
binary_as_string Parquet files generated by legacy writers do not correctly BOOL false
set the UTF8 flag for strings, causing string columns to be
loaded as BLOB instead. Set this to true to load binary
columns as strings.
encryption_config Configuration for Parquet encryption. STRUCT ‑
filename Whether or not an extra filename column should be BOOL false
included in the result.
file_row_number Whether or not to include the file_row_number BOOL false
column.
hive_partitioning Whether or not to interpret the path as a Hive partitioned BOOL false
path.
union_by_name Whether the columns of multiple schemas should be BOOL false
unified by name, rather than by position.
Partial Reading
DuckDB supports projection pushdown into the Parquet file itself. That is to say, when querying a Parquet file, only the columns required
for the query are read. This allows you to read only the part of the Parquet file that you are interested in. This will be done automatically
by DuckDB.
DuckDB also supports filter pushdown into the Parquet reader. When you apply a filter to a column that is scanned from a Parquet file,
the filter will be pushed down into the scan, and can even be used to skip parts of the file using the built‑in zonemaps. Note that this will
depend on whether or not your Parquet file contains zonemaps.
Filter and projection pushdown provide significant performance benefits. See our blog post on this for more information.
You can also insert the data into a table or create a table from the Parquet file directly. This will load the data from the Parquet file and
insert it into the database.
If you wish to keep the data stored inside the Parquet file, but want to query the Parquet file directly, you can create a view over the read_
parquet function. You can then query the Parquet file as if it were a built‑in table.
DuckDB also has support for writing to Parquet files using the COPY statement syntax. See the COPY Statement page for details, including
all possible parameters for the COPY statement.
28
DuckDB Documentation
-- write a query to a Parquet file with ZSTD compression (same as CODEC) and row_group_size
COPY
(FROM generate_series(100_000))
TO 'row-groups-zstd.parquet'
(FORMAT PARQUET, COMPRESSION ZSTD, ROW_GROUP_SIZE 100_000);
DuckDB's EXPORT command can be used to export an entire database to a series of Parquet files. See the Export statement documentation
for more details.
Encryption
The support for Parquet files is enabled via extension. The parquet extension is bundled with almost all clients. However, if your client
does not bundle the parquet extension, the extension must be installed and loaded separately.
INSTALL parquet;
LOAD parquet;
Parquet Metadata
The parquet_metadata function can be used to query the metadata contained within a Parquet file, which reveals various internal
details of the Parquet file such as the statistics of the different columns. This can be useful for figuring out what kind of skipping is possible
in Parquet files, or even to obtain a quick overview of what the different columns contain.
SELECT *
FROM parquet_metadata('test.parquet');
Field Type
file_name VARCHAR
row_group_id BIGINT
29
DuckDB Documentation
Field Type
row_group_num_rows BIGINT
row_group_num_columns BIGINT
row_group_bytes BIGINT
column_id BIGINT
file_offset BIGINT
num_values BIGINT
path_in_schema VARCHAR
type VARCHAR
stats_min VARCHAR
stats_max VARCHAR
stats_null_count BIGINT
stats_distinct_count BIGINT
stats_min_value VARCHAR
stats_max_value VARCHAR
compression VARCHAR
encodings VARCHAR
index_page_offset BIGINT
dictionary_page_offset BIGINT
data_page_offset BIGINT
total_compressed_size BIGINT
total_uncompressed_size BIGINT
key_value_metadata MAP(BLOB, BLOB)
Parquet Schema
The parquet_schema function can be used to query the internal schema contained within a Parquet file. Note that this is the schema
as it is contained within the metadata of the Parquet file. If you want to figure out the column names and types contained within a Parquet
file it is easier to use DESCRIBE.
Field Type
file_name VARCHAR
name VARCHAR
type VARCHAR
type_length VARCHAR
repetition_type VARCHAR
30
DuckDB Documentation
Field Type
num_children BIGINT
converted_type VARCHAR
scale BIGINT
precision BIGINT
field_id BIGINT
logical_type VARCHAR
The parquet_file_metadata function can be used to query file‑level metadata such as the format version and the encryption algo‑
rithm used.
SELECT *
FROM parquet_file_metadata('test.parquet');
Field Type
file_name VARCHAR
created_by VARCHAR
num_rows BIGINT
num_row_groups BIGINT
format_version BIGINT
encryption_algorithm VARCHAR
footer_signing_key_metadata VARCHAR
The parquet_kv_metadata function can be used to query custom metadata defined as key‑value pairs.
SELECT *
FROM parquet_kv_metadata('test.parquet');
Field Type
file_name VARCHAR
key BLOB
value BLOB
Parquet Encryption
Starting with version 0.10.0, DuckDB supports reading and writing encrypted Parquet files. DuckDB broadly follows the Parquet Modular
Encryption specification with some limitations.
31
DuckDB Documentation
Using the PRAGMA add_parquet_key function, named encryption keys of 128, 192, or 256 bits can be added to a session. These keys
are stored in‑memory.
Writing Encrypted Parquet Files After specifying the key (e.g., key256), files can be encrypted as follows:
Reading Encrpyted Parquet Files An encrypted Parquet file using a specific key (e.g., key256), can then be read as follows:
Or:
SELECT *
FROM read_parquet('tbl.parquet', encryption_config = {footer_key: 'key256'});
Limitations
1. It is not compatible with the encryption of, e.g., PyArrow, until the missing details are implemented.
2. DuckDB encrypts the footer and all columns using the footer_key. The Parquet specification allows encryption of individual
columns with different keys, e.g.:
However, this is unsupported at the moment and will cause an error to be thrown (for now):
Performance Implications
Note that encryption has some performance implications. Without encryption, reading/writing the lineitem table from TPC-H at SF1,
which is 6M rows and 15 columns, from/to a Parquet file takes 0.26 and 0.99 seconds, respectively. With encryption, this takes 0.64 and 2.21
seconds, both approximately 2.5× slower than the unencrypted version.
Parquet Tips
Use union_by_name When Loading Files with Different Schemas The union_by_name option can be used to unify the schema of
files that have different or missing columns. For files that do not have certain columns, NULL values are filled in.
SELECT *
FROM read_parquet('flights*.parquet', union_by_name = true);
32
DuckDB Documentation
Enabling PER_THREAD_OUTPUT If the final number of Parquet files is not important, writing one file per thread can significantly im‑
prove performance. Using a glob pattern upon read or a Hive partitioning structure are good ways to transparently handle multiple files.
COPY
(FROM generate_series(10_000_000))
TO 'test.parquet'
(FORMAT PARQUET, PER_THREAD_OUTPUT true);
Selecting a ROW_GROUP_SIZE The ROW_GROUP_SIZE parameter specifies the minimum number of rows in a Parquet row group, with
a minimum value equal to DuckDB's vector size (currently 2048, but adjustable when compiling DuckDB), and a default of 122,880. A Parquet
row group is a partition of rows, consisting of a column chunk for each column in the dataset.
Compression algorithms are only applied per row group, so the larger the row group size, the more opportunities to compress the data.
DuckDB can read Parquet row groups in parallel even within the same file and uses predicate pushdown to only scan the row groups whose
metadata ranges match the WHERE clause of the query. However there is some overhead associated with reading the metadata in each
group. A good approach would be to ensure that within each file, the total number of row groups is at least as large as the number of CPU
threads used to query that file. More row groups beyond the thread count would improve the speed of highly selective queries, but slow
down queries that must scan the whole file like aggregations.
Partitioning
Hive Partitioning
Examples
Hive Partitioning
Hive partitioning is a partitioning strategy that is used to split a table into multiple files based on partition keys. The files are organized
into folders. Within each folder, the partition key has a value that is determined by the name of the folder.
Below is an example of a Hive partitioned file hierarchy. The files are partitioned on two keys (year and month).
orders
├── year=2021
│ ├── month=1
│ │ ├── file1.parquet
│ │ └── file2.parquet
│ └── month=2
│ └── file3.parquet
└── year=2022
├── month=11
33
DuckDB Documentation
│ ├── file4.parquet
│ └── file5.parquet
└── month=12
└── file6.parquet
Files stored in this hierarchy can be read using the hive_partitioning flag.
SELECT *
FROM read_parquet('orders/*/*/*.parquet', hive_partitioning = true);
When we specify the hive_partitioning flag, the values of the columns will be read from the directories.
Filter Pushdown Filters on the partition keys are automatically pushed down into the files. This way the system skips reading files that
are not necessary to answer a query. For example, consider the following query on the above dataset:
SELECT *
FROM read_parquet('orders/*/*/*.parquet', hive_partitioning = true)
WHERE year = 2022 AND month = 11;
When executing this query, only the following files will be read:
orders
└── year=2022
└── month=11
├── file4.parquet
└── file5.parquet
Autodetection By default the system tries to infer if the provided files are in a hive partitioned hierarchy. And if so, the hive_
partitioning flag is enabled automatically. The autodetection will look at the names of the folders and search for a 'key' =
'value' pattern. This behaviour can be overridden by setting the hive_partitioning flag manually.
Hive Types hive_types is a way to specify the logical types of the hive partitions in a struct:
SELECT *
FROM read_parquet(
'dir/**/*.parquet',
hive_partitioning = true,
hive_types = {'release': DATE, 'orders': BIGINT}
);
hive_types will be autodetected for the following types: DATE, TIMESTAMP and BIGINT. To switch off the autodetection, the flag
hive_types_autocast = 0 can be set.
Partitioned Writes
Examples
34
DuckDB Documentation
Partitioned Writes
When the partition_by clause is specified for the COPY statement, the files are written in a Hive partitioned folder hierarchy. The target
is the name of the root directory (in the example above: orders). The files are written in‑order in the file hierarchy. Currently, one file is
written per thread to each directory.
orders
├── year=2021
│ ├── month=1
│ │ ├── data_1.parquet
│ │ └── data_2.parquet
│ └── month=2
│ └── data_1.parquet
└── year=2022
├── month=11
│ ├── data_1.parquet
│ └── data_2.parquet
└── month=12
└── data_1.parquet
The values of the partitions are automatically extracted from the data. Note that it can be very expensive to write many partitions as many
files will be created. The ideal partition count depends on how large your data set is.
Note. Bestpractice Writing data into many small partitions is expensive. It is generally recommended to have at least 100MB of
data per partition.
Overwriting By default the partitioned write will not allow overwriting existing directories. Use the OVERWRITE_OR_IGNORE option
to allow overwriting an existing directory.
Filename Pattern By default, files will be named data_0.parquet or data_0.csv. With the flag FILENAME_PATTERN a pattern
with {i} or {uuid} can be defined to create specific filenames:
-- write a table to a Hive partitioned data set of .parquet files, with an index in the filename
COPY orders TO 'orders' (FORMAT PARQUET, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE, FILENAME_
PATTERN "orders_{i}");
-- write a table to a Hive partitioned data set of .parquet files, with unique filenames
COPY orders TO 'orders' (FORMAT PARQUET, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE, FILENAME_
PATTERN "file_{uuid}");
Appender
The Appender can be used to load bulk data into a DuckDB database. It is currently available in the C, C++, Go, Java, and Rust APIs. The
Appender is tied to a connection, and will use the transaction context of that connection when appending. An Appender always appends
to a single table in the database file.
DuckDB db;
Connection con(db);
// create the table
con.Query("CREATE TABLE people (id INTEGER, name VARCHAR)");
// initialize the appender
Appender appender(con, "people");
35
DuckDB Documentation
The AppendRow function is the easiest way of appending data. It uses recursive templates to allow you to put all the values of a single row
within one function call, as follows:
appender.AppendRow(1, "Mark");
Rows can also be individually constructed using the BeginRow, EndRow and Append methods. This is done internally by AppendRow,
and hence has the same performance characteristics.
appender.BeginRow();
appender.Append<int32_t>(2);
appender.Append<string>("Hannes");
appender.EndRow();
Any values added to the appender are cached prior to being inserted into the database system for performance reasons. That means that,
while appending, the rows might not be immediately visible in the system. The cache is automatically flushed when the appender goes
out of scope or when appender.Close() is called. The cache can also be manually flushed using the appender.Flush() method.
After either Flush or Close is called, all the data has been written to the database system.
While numbers and strings are rather self‑explanatory, dates, times and timestamps require some explanation. They can be directly ap‑
pended using the methods provided by duckdb::Date, duckdb::Time or duckdb::Timestamp. They can also be appended using
the internal duckdb::Value type, however, this adds some additional overheads and should be avoided if possible.
If the appender encounters a PRIMARY KEY conflict or a UNIQUE constraint violation, it fails and returns the following error:
Constraint Error: PRIMARY KEY or UNIQUE constraint violated: duplicate key "..."
In this case, the entire append operation fails and no rows are inserted.
• C
• Go
• JDBC (Java)
• Rust
36
DuckDB Documentation
INSERT Statements
INSERT statements are the standard way of loading data into a relational database. When using INSERT statements, the values are
supplied row‑by‑row. While simple, there is significant overhead involved in parsing and processing individual INSERT statements. This
makes lots of individual row‑by‑row insertions very inefficient for bulk insertion.
Note. Bestpractice As a rule‑of‑thumb, avoid using lots of individual row‑by‑row INSERT statements when inserting more than a
few rows (i.e., avoid using INSERT statements as part of a loop). When bulk inserting data, try to maximize the amount of data that
is inserted per statement.
If you must use INSERT statements to load data in a loop, avoid executing the statements in auto‑commit mode. After every commit,
the database is required to sync the changes made to disk to ensure no data is lost. In auto‑commit mode every single statement will be
wrapped in a separate transaction, meaning fsync will be called for every statement. This is typically unnecessary when bulk loading and
will significantly slow down your program.
Note. If you absolutely must use INSERT statements in a loop to load data, wrap them in calls to BEGIN TRANSACTION and
COMMIT.
Syntax
For a more detailed description together with syntax diagram can be found, see the page on the INSERT statement.
37
DuckDB Documentation
38
Client APIs
• C
• C++
• Go by marcboeker
• Java
• Julia
• Node.js
• Python
• R
• Rust
• WebAssembly/Wasm
• ADBC API
• ODBC API
There are also contributed third‑party DuckDB wrappers, which currently do not have an official documentation page:
• C# by Giorgi
• Common Lisp by ak‑coram
• Crystal by amauryt
• Ruby by suketa
• Zig by karlseguin
Overview
DuckDB implements a custom C API modelled somewhat following the SQLite C API. The API is contained in the duckdb.h header. Con‑
tinue to Startup & Shutdown to get started, or check out the Full API overview.
We also provide a SQLite API wrapper which means that if your applications is programmed against the SQLite C API, you can re‑link to
DuckDB and it should continue working. See the sqlite_api_wrapper folder in our source repository for more information.
Installation
The DuckDB C API can be installed as part of the libduckdb packages. Please see the installation page for details.
To use DuckDB, you must first initialize a duckdb_database handle using duckdb_open(). duckdb_open() takes as parameter the
database file to read and write from. The special value NULL (nullptr) can be used to create an in‑memory database. Note that for an
in‑memory database no data is persisted to disk (i.e., all data is lost when you exit the process).
39
DuckDB Documentation
With the duckdb_database handle, you can create one or many duckdb_connection using duckdb_connect(). While individual
connections are thread‑safe, they will be locked during querying. It is therefore recommended that each thread uses its own connection
to allow for the best parallel performance.
All duckdb_connections have to explicitly be disconnected with duckdb_disconnect() and the duckdb_database has to be
explicitly closed with duckdb_close() to avoid memory and file handle leaking.
Example
duckdb_database db;
duckdb_connection con;
// run queries...
// cleanup
duckdb_disconnect(&con);
duckdb_close(&db);
API Reference
duckdb_open Creates a new database or opens an existing database file stored at the given path. If no path is given a new in‑memory
database is created instead. The instantiated database should be closed with 'duckdb_close'.
Syntax
duckdb_state duckdb_open(
const char *path,
duckdb_database *out_database
);
Parameters
• path
Path to the database file on disk, or nullptr or :memory: to open an in‑memory database.
• out_database
40
DuckDB Documentation
• returns
duckdb_open_ext Extended version of duckdb_open. Creates a new database or opens an existing database file stored at the given
path. The instantiated database should be closed with 'duckdb_close'.
Syntax
duckdb_state duckdb_open_ext(
const char *path,
duckdb_database *out_database,
duckdb_config config,
char **out_error
);
Parameters
• path
Path to the database file on disk, or nullptr or :memory: to open an in‑memory database.
• out_database
• config
• out_error
If set and the function returns DuckDBError, this will contain the reason why the start‑up failed. Note that the error must be freed using
duckdb_free.
• returns
duckdb_close Closes the specified database and de‑allocates all memory allocated for that database. This should be called after you
are done with any database allocated through duckdb_open or duckdb_open_ext. Note that failing to call duckdb_close (in case
of e.g., a program crash) will not cause data corruption. Still, it is recommended to always correctly close a database object after you are
done with it.
Syntax
void duckdb_close(
duckdb_database *database
);
Parameters
• database
duckdb_connect Opens a connection to a database. Connections are required to query the database, and store transactional state
associated with the connection. The instantiated connection should be closed using 'duckdb_disconnect'.
41
DuckDB Documentation
Syntax
duckdb_state duckdb_connect(
duckdb_database database,
duckdb_connection *out_connection
);
Parameters
• database
• out_connection
• returns
Syntax
void duckdb_interrupt(
duckdb_connection connection
);
Parameters
• connection
Syntax
duckdb_query_progress_type duckdb_query_progress(
duckdb_connection connection
);
Parameters
• connection
• returns
duckdb_disconnect Closes the specified connection and de‑allocates all memory allocated for that connection.
42
DuckDB Documentation
Syntax
void duckdb_disconnect(
duckdb_connection *connection
);
Parameters
• connection
duckdb_library_version Returns the version of the linked DuckDB, with a version postfix for dev versions
Usually used for developing C extensions that must return this for a compatibility check.
Syntax
);
Configuration
Configuration options can be provided to change different settings of the database system. Note that many of these settings can be changed
later on using PRAGMA statements as well. The configuration object should be created, filled with values and passed to duckdb_open_
ext.
Example
duckdb_database db;
duckdb_config config;
// run queries...
// cleanup
duckdb_close(&db);
43
DuckDB Documentation
API Reference
duckdb_create_config Initializes an empty configuration object that can be used to provide start‑up options for the DuckDB in‑
stance through duckdb_open_ext. The duckdb_config must be destroyed using 'duckdb_destroy_config'
Syntax
duckdb_state duckdb_create_config(
duckdb_config *out_config
);
Parameters
• out_config
• returns
duckdb_config_count This returns the total amount of configuration options available for usage with duckdb_get_config_
flag.
This should not be called in a loop as it internally loops over all the options.
Syntax
size_t duckdb_config_count(
);
Parameters
• returns
duckdb_get_config_flag Obtains a human‑readable name and description of a specific configuration option. This can be used to
e.g. display configuration options. This will succeed unless index is out of range (i.e., >= duckdb_config_count).
Syntax
duckdb_state duckdb_get_config_flag(
size_t index,
const char **out_name,
const char **out_description
);
44
DuckDB Documentation
Parameters
• index
• out_name
• out_description
• returns
duckdb_set_config Sets the specified option for the specified configuration. The configuration option is indicated by name. To
obtain a list of config options, see duckdb_get_config_flag.
This can fail if either the name is invalid, or if the value provided for the option is invalid.
Syntax
duckdb_state duckdb_set_config(
duckdb_config config,
const char *name,
const char *option
);
Parameters
• duckdb_config
• name
• option
• returns
duckdb_destroy_config Destroys the specified configuration object and de‑allocates all memory allocated for the object.
Syntax
void duckdb_destroy_config(
duckdb_config *config
);
45
DuckDB Documentation
Parameters
• config
Query
The duckdb_query method allows SQL queries to be run in DuckDB from C. This method takes two parameters, a (null‑terminated) SQL
query string and a duckdb_result result pointer. The result pointer may be NULL if the application is not interested in the result set
or if the query produces no result. After the result is consumed, the duckdb_destroy_result method should be used to clean up the
result.
Elements can be extracted from the duckdb_result object using a variety of methods. The duckdb_column_count and duckdb_
row_count methods can be used to extract the number of columns and the number of rows, respectively. duckdb_column_name and
duckdb_column_type can be used to extract the names and types of individual columns.
Example
duckdb_state state;
duckdb_result result;
// create a table
state = duckdb_query(con, "CREATE TABLE integers (i INTEGER, j INTEGER);", NULL);
if (state == DuckDBError) {
// handle error
}
// insert three rows into the table
state = duckdb_query(con, "INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL);", NULL);
if (state == DuckDBError) {
// handle error
}
// query rows again
state = duckdb_query(con, "SELECT * FROM integers", &result);
if (state == DuckDBError) {
// handle error
}
// handle the result
// ...
Value Extraction
Values can be extracted using either the duckdb_column_data/duckdb_nullmask_data functions, or using the duckdb_value
convenience functions. The duckdb_column_data/duckdb_nullmask_data functions directly hand you a pointer to the result
arrays in columnar format, and can therefore be very fast. The duckdb_value functions perform bounds‑ and type‑checking, and will
automatically cast values to the desired type. This makes them more convenient and easier to use, at the expense of being slower.
Note. For optimal performance, use duckdb_column_data and duckdb_nullmask_data to extract data from the query
result. The duckdb_value functions perform internal type‑checking, bounds‑checking and casting which makes them slower.
46
DuckDB Documentation
duckdb_value Below is an example that prints the above result to CSV format using the duckdb_value_varchar function. Note
that the function is generic: we do not need to know about the types of the individual result columns.
duckdb_column_data Below is an example that prints the above result to CSV format using the duckdb_column_data function.
Note that the function is NOT generic: we do need to know exactly what the types of the result columns are.
Note. Warning When using duckdb_column_data, be careful that the type matches exactly what you expect it to be. As the code
directly accesses an internal array, there is no type‑checking. Accessing a DUCKDB_TYPE_INTEGER column as if it was a DUCKDB_
TYPE_BIGINT column will provide unpredictable results!
API Reference
47
DuckDB Documentation
duckdb_query Executes a SQL query within a connection and stores the full (materialized) result in the out_result pointer. If the query
fails to execute, DuckDBError is returned and the error message can be retrieved by calling duckdb_result_error.
Note that after running duckdb_query, duckdb_destroy_result must be called on the result object even if the query fails, other‑
wise the error stored within the result will not be freed correctly.
Syntax
duckdb_state duckdb_query(
duckdb_connection connection,
const char *query,
duckdb_result *out_result
);
Parameters
• connection
• query
• out_result
• returns
duckdb_destroy_result Closes the result and de‑allocates all memory allocated for that connection.
Syntax
void duckdb_destroy_result(
duckdb_result *result
);
Parameters
• result
duckdb_column_name Returns the column name of the specified column. The result should not need to be freed; the column names
will automatically be destroyed when the result is destroyed.
Syntax
48
DuckDB Documentation
Parameters
• result
• col
• returns
Syntax
duckdb_type duckdb_column_type(
duckdb_result *result,
idx_t col
);
Parameters
• result
• col
• returns
duckdb_result_statement_type Returns the statement type of the statement that was executed
Syntax
duckdb_statement_type duckdb_result_statement_type(
duckdb_result result
);
Parameters
• result
• returns
49
DuckDB Documentation
Syntax
duckdb_logical_type duckdb_column_logical_type(
duckdb_result *result,
idx_t col
);
Parameters
• result
• col
• returns
Syntax
idx_t duckdb_column_count(
duckdb_result *result
);
Parameters
• result
• returns
Syntax
idx_t duckdb_row_count(
duckdb_result *result
);
Parameters
• result
• returns
duckdb_rows_changed Returns the number of rows changed by the query stored in the result. This is relevant only for IN‑
SERT/UPDATE/DELETE queries. For other queries the rows_changed will be 0.
50
DuckDB Documentation
Syntax
idx_t duckdb_rows_changed(
duckdb_result *result
);
Parameters
• result
• returns
The function returns a dense array which contains the result data. The exact type stored in the array depends on the corresponding duckdb_
type (as provided by duckdb_column_type). For the exact type by which the data should be accessed, see the comments in the types
section or the DUCKDB_TYPE enum.
For example, for a column of type DUCKDB_TYPE_INTEGER, rows can be accessed in the following manner:
Syntax
void *duckdb_column_data(
duckdb_result *result,
idx_t col
);
Parameters
• result
• col
• returns
Returns the nullmask of a specific column of a result in columnar format. The nullmask indicates for every row whether or not the corre‑
sponding row is NULL. If a row is NULL, the values present in the array provided by duckdb_column_data are undefined.
51
DuckDB Documentation
Syntax
bool *duckdb_nullmask_data(
duckdb_result *result,
idx_t col
);
Parameters
• result
• col
• returns
duckdb_result_error Returns the error message contained within the result. The error is only set if duckdb_query returns
DuckDBError.
The result of this function must not be freed. It will be cleaned up when duckdb_destroy_result is called.
Syntax
Parameters
• result
• returns
Data Chunks
Data chunks represent a horizontal slice of a table. They hold a number of vectors, that can each hold up to the VECTOR_SIZE rows. The
vector size can be obtained through the duckdb_vector_size function and is configurable, but is usually set to 2048.
Data chunks and vectors are what DuckDB uses natively to store and represent data. For this reason, the data chunk interface is the most
efficient way of interfacing with DuckDB. Be aware, however, that correctly interfacing with DuckDB using the data chunk API does require
knowledge of DuckDB's internal vector format.
The primary manner of interfacing with data chunks is by obtaining the internal vectors of the data chunk using the duckdb_data_
chunk_get_vector method, and subsequently using the duckdb_vector_get_data and duckdb_vector_get_validity
methods to read the internal data and the validity mask of the vector. For composite types (list and struct vectors), duckdb_list_
vector_get_child and duckdb_struct_vector_get_child should be used to read child vectors.
52
DuckDB Documentation
API Reference
Vector Interface
Syntax
duckdb_data_chunk duckdb_create_data_chunk(
duckdb_logical_type *types,
idx_t column_count
);
Parameters
• types
• column_count
• returns
53
DuckDB Documentation
duckdb_destroy_data_chunk Destroys the data chunk and de‑allocates all memory allocated for that chunk.
Syntax
void duckdb_destroy_data_chunk(
duckdb_data_chunk *chunk
);
Parameters
• chunk
duckdb_data_chunk_reset Resets a data chunk, clearing the validity masks and setting the cardinality of the data chunk to 0.
Syntax
void duckdb_data_chunk_reset(
duckdb_data_chunk chunk
);
Parameters
• chunk
Syntax
idx_t duckdb_data_chunk_get_column_count(
duckdb_data_chunk chunk
);
Parameters
• chunk
• returns
duckdb_data_chunk_get_vector Retrieves the vector at the specified column index in the data chunk.
The pointer to the vector is valid for as long as the chunk is alive. It does NOT need to be destroyed.
Syntax
duckdb_vector duckdb_data_chunk_get_vector(
duckdb_data_chunk chunk,
idx_t col_idx
);
54
DuckDB Documentation
Parameters
• chunk
• returns
The vector
Syntax
idx_t duckdb_data_chunk_get_size(
duckdb_data_chunk chunk
);
Parameters
• chunk
• returns
Syntax
void duckdb_data_chunk_set_size(
duckdb_data_chunk chunk,
idx_t size
);
Parameters
• chunk
• size
Syntax
duckdb_logical_type duckdb_vector_get_column_type(
duckdb_vector vector
);
55
DuckDB Documentation
Parameters
• vector
• returns
The data pointer can be used to read or write values from the vector. How to read or write values depends on the type of the vector.
Syntax
void *duckdb_vector_get_data(
duckdb_vector vector
);
Parameters
• vector
• returns
The validity mask is a bitset that signifies null‑ness within the data chunk. It is a series of uint64_t values, where each uint64_t value contains
validity for 64 tuples. The bit is set to 1 if the value is valid (i.e., not NULL) or 0 if the value is invalid (i.e., NULL).
idx_t entry_idx = row_idx / 64; idx_t idx_in_entry = row_idx % 64; bool is_valid = validity_mask[entry_idx] & (1 « idx_in_entry);
Syntax
uint64_t *duckdb_vector_get_validity(
duckdb_vector vector
);
Parameters
• vector
• returns
56
DuckDB Documentation
After this function is called, duckdb_vector_get_validity will ALWAYS return non‑NULL. This allows null values to be written to the
vector, regardless of whether a validity mask was present before.
Syntax
void duckdb_vector_ensure_validity_writable(
duckdb_vector vector
);
Parameters
• vector
Syntax
void duckdb_vector_assign_string_element(
duckdb_vector vector,
idx_t index,
const char *str
);
Parameters
• vector
• index
• str
duckdb_vector_assign_string_element_len Assigns a string element in the vector at the specified location. You may also
use this function to assign BLOBs.
Syntax
void duckdb_vector_assign_string_element_len(
duckdb_vector vector,
idx_t index,
const char *str,
idx_t str_len
);
57
DuckDB Documentation
Parameters
• vector
• index
• str
The string
• str_len
Syntax
duckdb_vector duckdb_list_vector_get_child(
duckdb_vector vector
);
Parameters
• vector
The vector
• returns
Syntax
idx_t duckdb_list_vector_get_size(
duckdb_vector vector
);
Parameters
• vector
The vector
• returns
duckdb_list_vector_set_size Sets the total size of the underlying child‑vector of a list vector.
58
DuckDB Documentation
Syntax
duckdb_state duckdb_list_vector_set_size(
duckdb_vector vector,
idx_t size
);
Parameters
• vector
• size
• returns
Syntax
duckdb_state duckdb_list_vector_reserve(
duckdb_vector vector,
idx_t required_capacity
);
Parameters
• vector
• required_capacity
• return
Syntax
duckdb_vector duckdb_struct_vector_get_child(
duckdb_vector vector,
idx_t index
);
59
DuckDB Documentation
Parameters
• vector
The vector
• index
• returns
The resulting vector is valid as long as the parent vector is valid. The resulting vector has the size of the parent vector multiplied by the
array size.
Syntax
duckdb_vector duckdb_array_vector_get_child(
duckdb_vector vector
);
Parameters
• vector
The vector
• returns
duckdb_validity_row_is_valid Returns whether or not a row is valid (i.e., not NULL) in the given validity mask.
Syntax
bool duckdb_validity_row_is_valid(
uint64_t *validity,
idx_t row
);
Parameters
• validity
• row
• returns
60
DuckDB Documentation
Syntax
void duckdb_validity_set_row_validity(
uint64_t *validity,
idx_t row,
bool valid
);
Parameters
• validity
• row
• valid
Syntax
void duckdb_validity_set_row_invalid(
uint64_t *validity,
idx_t row
);
Parameters
• validity
• row
Syntax
void duckdb_validity_set_row_valid(
uint64_t *validity,
idx_t row
);
61
DuckDB Documentation
Parameters
• validity
• row
Values
API Reference
duckdb_destroy_value Destroys the value and de‑allocates all memory allocated for that type.
Syntax
void duckdb_destroy_value(
duckdb_value *value
);
Parameters
• value
Syntax
duckdb_value duckdb_create_varchar(
const char *text
);
62
DuckDB Documentation
Parameters
• value
• returns
Syntax
duckdb_value duckdb_create_varchar_length(
const char *text,
idx_t length
);
Parameters
• value
The text
• length
• returns
Syntax
duckdb_value duckdb_create_int64(
int64_t val
);
Parameters
• value
• returns
Syntax
duckdb_value duckdb_create_struct_value(
duckdb_logical_type type,
duckdb_value *values
);
63
DuckDB Documentation
Parameters
• type
• values
• returns
duckdb_create_list_value Creates a list value from a type and an array of values of length value_count
Syntax
duckdb_value duckdb_create_list_value(
duckdb_logical_type type,
duckdb_value *values,
idx_t value_count
);
Parameters
• type
• values
• value_count
• returns
duckdb_create_array_value Creates a array value from a type and an array of values of length value_count
Syntax
duckdb_value duckdb_create_array_value(
duckdb_logical_type type,
duckdb_value *values,
idx_t value_count
);
Parameters
• type
• values
64
DuckDB Documentation
• value_count
• returns
duckdb_get_varchar Obtains a string representation of the given value. The result must be destroyed with duckdb_free.
Syntax
char *duckdb_get_varchar(
duckdb_value value
);
Parameters
• value
The value
• returns
Syntax
int64_t duckdb_get_int64(
duckdb_value value
);
Parameters
• value
The value
• returns
Types
DuckDB is a strongly typed database system. As such, every column has a single type specified. This type is constant over the entire column.
That is to say, a column that is labeled as an INTEGER column will only contain INTEGER values.
DuckDB also supports columns of composite types. For example, it is possible to define an array of integers (INT[]). It is also possible to
define types as arbitrary structs (ROW(i INTEGER, j VARCHAR)). For that reason, native DuckDB type objects are not mere enums,
but a class that can potentially be nested.
Types in the C API are modeled using an enum (duckdb_type) and a complex class (duckdb_logical_type). For most primitive
types, e.g., integers or varchars, the enum is sufficient. For more complex types, such as lists, structs or decimals, the logical type must be
used.
65
DuckDB Documentation
Functions
The enum type of a column in the result can be obtained using the duckdb_column_type function. The logical type of a column can be
obtained using the duckdb_column_logical_type function.
duckdb_value The duckdb_value functions will auto‑cast values as required. For example, it is no problem to use duckdb_
value_double on a column of type duckdb_value_int32. The value will be auto‑cast and returned as a double. Note that in certain
cases the cast may fail. For example, this can happen if we request a duckdb_value_int8 and the value does not fit within an int8
value. In this case, a default value will be returned (usually 0 or nullptr). The same default value will also be returned if the corresponding
value is NULL.
The duckdb_value_is_null function can be used to check if a specific value is NULL or not.
The exception to the auto‑cast rule is the duckdb_value_varchar_internal function. This function does not auto‑cast and only
works for VARCHAR columns. The reason this function exists is that the result does not need to be freed.
Note. duckdb_value_varchar and duckdb_value_blob require the result to be de‑allocated using duckdb_free.
duckdb_result_get_chunk The duckdb_result_get_chunk function can be used to read data chunks from a DuckDB result
set, and is the most efficient way of reading data from a DuckDB result using the C API. It is also the only way of reading data of certain types
from a DuckDB result. For example, the duckdb_value functions do not support structural reading of composite types (lists or structs)
or more complex types like enums and decimals.
For more information about data chunks, see the documentation on data chunks.
66
DuckDB Documentation
API Reference
Date/Time/Timestamp Helpers
Hugeint Helpers
Decimal Helpers
67
DuckDB Documentation
duckdb_result_get_chunk Fetches a data chunk from the duckdb_result. This function should be called repeatedly until the result
is exhausted.
This function supersedes all duckdb_value functions, as well as the duckdb_column_data and duckdb_nullmask_data func‑
tions. It results in significantly better performance, and should be preferred in newer code‑bases.
If this function is used, none of the other result functions can be used and vice versa (i.e., this function cannot be mixed with the legacy
result functions).
Use duckdb_result_chunk_count to figure out how many chunks there are in the result.
Syntax
duckdb_data_chunk duckdb_result_get_chunk(
duckdb_result result,
idx_t chunk_index
);
Parameters
• result
• chunk_index
• returns
The resulting data chunk. Returns NULL if the chunk index is out of bounds.
Syntax
bool duckdb_result_is_streaming(
duckdb_result result
);
Parameters
• result
• returns
68
DuckDB Documentation
Syntax
idx_t duckdb_result_chunk_count(
duckdb_result result
);
Parameters
• result
• returns
Syntax
duckdb_result_type duckdb_result_return_type(
duckdb_result result
);
Parameters
• result
• returns
The return_type
duckdb_from_date Decompose a duckdb_date object into year, month and date (stored as duckdb_date_struct).
Syntax
duckdb_date_struct duckdb_from_date(
duckdb_date date
);
Parameters
• date
• returns
69
DuckDB Documentation
Syntax
duckdb_date duckdb_to_date(
duckdb_date_struct date
);
Parameters
• date
• returns
Syntax
bool duckdb_is_finite_date(
duckdb_date date
);
Parameters
• date
• returns
duckdb_from_time Decompose a duckdb_time object into hour, minute, second and microsecond (stored as duckdb_time_
struct).
Syntax
duckdb_time_struct duckdb_from_time(
duckdb_time time
);
Parameters
• time
• returns
70
DuckDB Documentation
Syntax
duckdb_time_tz duckdb_create_time_tz(
int64_t micros,
int32_t offset
);
Parameters
• micros
• offset
• returns
Use duckdb_from_time to further decompose the micros into hour, minute, second and microsecond.
Syntax
duckdb_time_tz_struct duckdb_from_time_tz(
duckdb_time_tz micros
);
Parameters
• micros
• out_micros
• out_offset
duckdb_to_time Re‑compose a duckdb_time from hour, minute, second and microsecond (duckdb_time_struct).
Syntax
duckdb_time duckdb_to_time(
duckdb_time_struct time
);
Parameters
• time
• returns
71
DuckDB Documentation
Syntax
duckdb_timestamp_struct duckdb_from_timestamp(
duckdb_timestamp ts
);
Parameters
• ts
• returns
Syntax
duckdb_timestamp duckdb_to_timestamp(
duckdb_timestamp_struct ts
);
Parameters
• ts
• returns
Syntax
bool duckdb_is_finite_timestamp(
duckdb_timestamp ts
);
Parameters
• ts
• returns
duckdb_hugeint_to_double Converts a duckdb_hugeint object (as obtained from a DUCKDB_TYPE_HUGEINT column) into a
double.
72
DuckDB Documentation
Syntax
double duckdb_hugeint_to_double(
duckdb_hugeint val
);
Parameters
• val
• returns
If the conversion fails because the double value is too big the result will be 0.
Syntax
duckdb_hugeint duckdb_double_to_hugeint(
double val
);
Parameters
• val
• returns
If the conversion fails because the double value is too big, or the width/scale are invalid the result will be 0.
Syntax
duckdb_decimal duckdb_double_to_decimal(
double val,
uint8_t width,
uint8_t scale
);
Parameters
• val
• returns
73
DuckDB Documentation
duckdb_decimal_to_double Converts a duckdb_decimal object (as obtained from a DUCKDB_TYPE_DECIMAL column) into a
double.
Syntax
double duckdb_decimal_to_double(
duckdb_decimal val
);
Parameters
• val
• returns
duckdb_create_logical_type Creates a duckdb_logical_type from a standard primitive type. The resulting type should be
destroyed with duckdb_destroy_logical_type.
Syntax
duckdb_logical_type duckdb_create_logical_type(
duckdb_type type
);
Parameters
• type
• returns
duckdb_logical_type_get_alias Returns the alias of a duckdb_logical_type, if one is set, else NULL. The result must be de‑
stroyed with duckdb_free.
Syntax
char *duckdb_logical_type_get_alias(
duckdb_logical_type type
);
Parameters
• type
• returns
74
DuckDB Documentation
duckdb_create_list_type Creates a list type from its child type. The resulting type should be destroyed with duckdb_destroy_
logical_type.
Syntax
duckdb_logical_type duckdb_create_list_type(
duckdb_logical_type type
);
Parameters
• type
• returns
duckdb_create_array_type Creates a array type from its child type. The resulting type should be destroyed with duckdb_
destroy_logical_type.
Syntax
duckdb_logical_type duckdb_create_array_type(
duckdb_logical_type type,
idx_t array_size
);
Parameters
• type
• array_size
• returns
duckdb_create_map_type Creates a map type from its key type and value type. The resulting type should be destroyed with
duckdb_destroy_logical_type.
Syntax
duckdb_logical_type duckdb_create_map_type(
duckdb_logical_type key_type,
duckdb_logical_type value_type
);
75
DuckDB Documentation
Parameters
• type
• returns
duckdb_create_union_type Creates a UNION type from the passed types array. The resulting type should be destroyed with
duckdb_destroy_logical_type.
Syntax
duckdb_logical_type duckdb_create_union_type(
duckdb_logical_type *member_types,
const char **member_names,
idx_t member_count
);
Parameters
• types
• type_amount
• returns
duckdb_create_struct_type Creates a STRUCT type from the passed member name and type arrays. The resulting type should
be destroyed with duckdb_destroy_logical_type.
Syntax
duckdb_logical_type duckdb_create_struct_type(
duckdb_logical_type *member_types,
const char **member_names,
idx_t member_count
);
Parameters
• member_types
• member_names
• member_count
• returns
76
DuckDB Documentation
duckdb_create_enum_type Creates an ENUM type from the passed member name array. The resulting type should be destroyed
with duckdb_destroy_logical_type.
Syntax
duckdb_logical_type duckdb_create_enum_type(
const char **member_names,
idx_t member_count
);
Parameters
• enum_name
• member_names
• member_count
• returns
duckdb_create_decimal_type Creates a duckdb_logical_type of type decimal with the specified width and scale. The re‑
sulting type should be destroyed with duckdb_destroy_logical_type.
Syntax
duckdb_logical_type duckdb_create_decimal_type(
uint8_t width,
uint8_t scale
);
Parameters
• width
• scale
• returns
Syntax
duckdb_type duckdb_get_type_id(
duckdb_logical_type type
);
77
DuckDB Documentation
Parameters
• type
• returns
The type id
Syntax
uint8_t duckdb_decimal_width(
duckdb_logical_type type
);
Parameters
• type
• returns
Syntax
uint8_t duckdb_decimal_scale(
duckdb_logical_type type
);
Parameters
• type
• returns
Syntax
duckdb_type duckdb_decimal_internal_type(
duckdb_logical_type type
);
78
DuckDB Documentation
Parameters
• type
• returns
Syntax
duckdb_type duckdb_enum_internal_type(
duckdb_logical_type type
);
Parameters
• type
• returns
Syntax
uint32_t duckdb_enum_dictionary_size(
duckdb_logical_type type
);
Parameters
• type
• returns
duckdb_enum_dictionary_value Retrieves the dictionary value at the specified position from the enum.
Syntax
char *duckdb_enum_dictionary_value(
duckdb_logical_type type,
idx_t index
);
79
DuckDB Documentation
Parameters
• type
• index
• returns
The string value of the enum type. Must be freed with duckdb_free.
Syntax
duckdb_logical_type duckdb_list_type_child_type(
duckdb_logical_type type
);
Parameters
• type
• returns
The child type of the list type. Must be destroyed with duckdb_destroy_logical_type.
Syntax
duckdb_logical_type duckdb_array_type_child_type(
duckdb_logical_type type
);
Parameters
• type
• returns
The child type of the array type. Must be destroyed with duckdb_destroy_logical_type.
80
DuckDB Documentation
Syntax
idx_t duckdb_array_type_array_size(
duckdb_logical_type type
);
Parameters
• type
• returns
The fixed number of elements the values of this array type can store.
Syntax
duckdb_logical_type duckdb_map_type_key_type(
duckdb_logical_type type
);
Parameters
• type
• returns
The key type of the map type. Must be destroyed with duckdb_destroy_logical_type.
Syntax
duckdb_logical_type duckdb_map_type_value_type(
duckdb_logical_type type
);
Parameters
• type
• returns
The value type of the map type. Must be destroyed with duckdb_destroy_logical_type.
81
DuckDB Documentation
Syntax
idx_t duckdb_struct_type_child_count(
duckdb_logical_type type
);
Parameters
• type
• returns
Syntax
char *duckdb_struct_type_child_name(
duckdb_logical_type type,
idx_t index
);
Parameters
• type
• index
• returns
duckdb_struct_type_child_type Retrieves the child type of the given struct type at the specified index.
Syntax
duckdb_logical_type duckdb_struct_type_child_type(
duckdb_logical_type type,
idx_t index
);
82
DuckDB Documentation
Parameters
• type
• index
• returns
The child type of the struct type. Must be destroyed with duckdb_destroy_logical_type.
duckdb_union_type_member_count Returns the number of members that the union type has.
Syntax
idx_t duckdb_union_type_member_count(
duckdb_logical_type type
);
Parameters
• type
• returns
Syntax
char *duckdb_union_type_member_name(
duckdb_logical_type type,
idx_t index
);
Parameters
• type
• index
• returns
duckdb_union_type_member_type Retrieves the child type of the given union member at the specified index.
83
DuckDB Documentation
Syntax
duckdb_logical_type duckdb_union_type_member_type(
duckdb_logical_type type,
idx_t index
);
Parameters
• type
• index
• returns
The child type of the union member. Must be destroyed with duckdb_destroy_logical_type.
duckdb_destroy_logical_type Destroys the logical type and de‑allocates all memory allocated for that type.
Syntax
void duckdb_destroy_logical_type(
duckdb_logical_type *type
);
Parameters
• type
Prepared Statements
A prepared statement is a parameterized query. The query is prepared with question marks (?) or dollar symbols ($1) indicating the
parameters of the query. Values can then be bound to these parameters, after which the prepared statement can be executed using those
parameters. A single query can be prepared once and executed many times.
• Easily supply parameters to functions while avoiding string concatenation/SQL injection attacks.
• Speeding up queries that will be executed many times with different parameters.
DuckDB supports prepared statements in the C API with the duckdb_prepare method. The duckdb_bind family of functions is used
to supply values for subsequent execution of the prepared statement using duckdb_execute_prepared. After we are done with the
prepared statement it can be cleaned up using the duckdb_destroy_prepare method.
Example
duckdb_prepared_statement stmt;
duckdb_result result;
if (duckdb_prepare(con, "INSERT INTO integers VALUES ($1, $2)", &stmt) == DuckDBError) {
// handle error
}
84
DuckDB Documentation
// clean up
duckdb_destroy_result(&result);
duckdb_destroy_prepare(&stmt);
After calling duckdb_prepare, the prepared statement parameters can be inspected using duckdb_nparams and duckdb_param_
type. In case the prepare fails, the error can be obtained through duckdb_prepare_error.
It is not required that the duckdb_bind family of functions matches the prepared statement parameter type exactly. The values will be
auto‑cast to the required value as required. For example, calling duckdb_bind_int8 on a parameter type of DUCKDB_TYPE_INTEGER
will work as expected.
Note. Warning Do not use prepared statements to insert large amounts of data into DuckDB. Instead it is recommended to use the
Appender.
API Reference
Note that after calling duckdb_prepare, the prepared statement should always be destroyed using duckdb_destroy_prepare,
even if the prepare fails.
If the prepare fails, duckdb_prepare_error can be called to obtain the reason why the prepare failed.
Syntax
duckdb_state duckdb_prepare(
duckdb_connection connection,
const char *query,
duckdb_prepared_statement *out_prepared_statement
);
85
DuckDB Documentation
Parameters
• connection
• query
• out_prepared_statement
• returns
duckdb_destroy_prepare Closes the prepared statement and de‑allocates all memory allocated for the statement.
Syntax
void duckdb_destroy_prepare(
duckdb_prepared_statement *prepared_statement
);
Parameters
• prepared_statement
duckdb_prepare_error Returns the error message associated with the given prepared statement. If the prepared statement has no
error message, this returns nullptr instead.
The error message should not be freed. It will be de‑allocated when duckdb_destroy_prepare is called.
Syntax
Parameters
• prepared_statement
• returns
duckdb_nparams Returns the number of parameters that can be provided to the given prepared statement.
86
DuckDB Documentation
Syntax
idx_t duckdb_nparams(
duckdb_prepared_statement prepared_statement
);
Parameters
• prepared_statement
duckdb_parameter_name Returns the name used to identify the parameter The returned string should be freed using duckdb_
free.
Returns NULL if the index is out of range for the provided prepared statement.
Syntax
Parameters
• prepared_statement
The prepared statement for which to get the parameter name from.
duckdb_param_type Returns the parameter type for the parameter at the given index.
Returns DUCKDB_TYPE_INVALID if the parameter index is out of range or the statement was not successfully prepared.
Syntax
duckdb_type duckdb_param_type(
duckdb_prepared_statement prepared_statement,
idx_t param_idx
);
Parameters
• prepared_statement
• param_idx
• returns
87
DuckDB Documentation
Syntax
duckdb_state duckdb_clear_bindings(
duckdb_prepared_statement prepared_statement
);
Syntax
duckdb_statement_type duckdb_prepared_statement_type(
duckdb_prepared_statement statement
);
Parameters
• statement
• returns
Appender
Appenders are the most efficient way of loading data into DuckDB from within the C interface, and are recommended for fast data loading.
The appender is much faster than using prepared statements or individual INSERT INTO statements.
Appends are made in row‑wise format. For every column, a duckdb_append_[type] call should be made, after which the row should
be finished by calling duckdb_appender_end_row. After all rows have been appended, duckdb_appender_destroy should be
used to finalize the appender and clean up the resulting memory.
Note that duckdb_appender_destroy should always be called on the resulting appender, even if the function returns DuckDBEr-
ror.
Example
duckdb_appender appender;
if (duckdb_appender_create(con, NULL, "people", &appender) == DuckDBError) {
// handle error
}
// append the first row (1, Mark)
duckdb_append_int32(appender, 1);
duckdb_append_varchar(appender, "Mark");
duckdb_appender_end_row(appender);
88
DuckDB Documentation
API Reference
Syntax
duckdb_state duckdb_appender_create(
duckdb_connection connection,
const char *schema,
const char *table,
duckdb_appender *out_appender
);
Parameters
• connection
• schema
The schema of the table to append to, or nullptr for the default schema.
• table
89
DuckDB Documentation
• out_appender
• returns
duckdb_appender_column_count Returns the number of columns in the table that belongs to the appender.
Syntax
idx_t duckdb_appender_column_count(
duckdb_appender appender
);
Parameters
• returns
Syntax
duckdb_logical_type duckdb_appender_column_type(
duckdb_appender appender,
idx_t col_idx
);
Parameters
• returns
duckdb_appender_error Returns the error message associated with the given appender. If the appender has no error message, this
returns nullptr instead.
The error message should not be freed. It will be de‑allocated when duckdb_appender_destroy is called.
Syntax
90
DuckDB Documentation
Parameters
• appender
• returns
duckdb_appender_flush Flush the appender to the table, forcing the cache of the appender to be cleared and the data to be ap‑
pended to the base table.
This should generally not be used unless you know what you are doing. Instead, call duckdb_appender_destroy when you are done
with the appender.
Syntax
duckdb_state duckdb_appender_flush(
duckdb_appender appender
);
Parameters
• appender
• returns
duckdb_appender_close Close the appender, flushing all intermediate state in the appender to the table and closing it for further
appends.
Syntax
duckdb_state duckdb_appender_close(
duckdb_appender appender
);
Parameters
• appender
• returns
duckdb_appender_destroy Close the appender and destroy it. Flushing all intermediate state in the appender to the table, and
de‑allocating all memory associated with the appender.
91
DuckDB Documentation
Syntax
duckdb_state duckdb_appender_destroy(
duckdb_appender *appender
);
Parameters
• appender
• returns
duckdb_appender_begin_row A nop function, provided for backwards compatibility reasons. Does nothing. Only duckdb_
appender_end_row is required.
Syntax
duckdb_state duckdb_appender_begin_row(
duckdb_appender appender
);
duckdb_appender_end_row Finish the current row of appends. After end_row is called, the next row can be appended.
Syntax
duckdb_state duckdb_appender_end_row(
duckdb_appender appender
);
Parameters
• appender
The appender.
• returns
Syntax
duckdb_state duckdb_append_bool(
duckdb_appender appender,
bool value
);
92
DuckDB Documentation
Syntax
duckdb_state duckdb_append_int8(
duckdb_appender appender,
int8_t value
);
Syntax
duckdb_state duckdb_append_int16(
duckdb_appender appender,
int16_t value
);
Syntax
duckdb_state duckdb_append_int32(
duckdb_appender appender,
int32_t value
);
Syntax
duckdb_state duckdb_append_int64(
duckdb_appender appender,
int64_t value
);
Syntax
duckdb_state duckdb_append_hugeint(
duckdb_appender appender,
duckdb_hugeint value
);
Syntax
duckdb_state duckdb_append_uint8(
duckdb_appender appender,
uint8_t value
);
93
DuckDB Documentation
Syntax
duckdb_state duckdb_append_uint16(
duckdb_appender appender,
uint16_t value
);
Syntax
duckdb_state duckdb_append_uint32(
duckdb_appender appender,
uint32_t value
);
Syntax
duckdb_state duckdb_append_uint64(
duckdb_appender appender,
uint64_t value
);
Syntax
duckdb_state duckdb_append_uhugeint(
duckdb_appender appender,
duckdb_uhugeint value
);
Syntax
duckdb_state duckdb_append_float(
duckdb_appender appender,
float value
);
Syntax
duckdb_state duckdb_append_double(
duckdb_appender appender,
double value
);
94
DuckDB Documentation
Syntax
duckdb_state duckdb_append_date(
duckdb_appender appender,
duckdb_date value
);
Syntax
duckdb_state duckdb_append_time(
duckdb_appender appender,
duckdb_time value
);
Syntax
duckdb_state duckdb_append_timestamp(
duckdb_appender appender,
duckdb_timestamp value
);
Syntax
duckdb_state duckdb_append_interval(
duckdb_appender appender,
duckdb_interval value
);
Syntax
duckdb_state duckdb_append_varchar(
duckdb_appender appender,
const char *val
);
Syntax
duckdb_state duckdb_append_varchar_length(
duckdb_appender appender,
const char *val,
idx_t length
);
95
DuckDB Documentation
Syntax
duckdb_state duckdb_append_blob(
duckdb_appender appender,
const void *data,
idx_t length
);
Syntax
duckdb_state duckdb_append_null(
duckdb_appender appender
);
The types of the data chunk must exactly match the types of the table, no casting is performed. If the types do not match or the appender
is in an invalid state, DuckDBError is returned. If the append is successful, DuckDBSuccess is returned.
Syntax
duckdb_state duckdb_append_data_chunk(
duckdb_appender appender,
duckdb_data_chunk chunk
);
Parameters
• appender
• chunk
• returns
Table Functions
The table function API can be used to define a table function that can then be called from within DuckDB in the FROM clause of a query.
API Reference
duckdb_table_function duckdb_create_table_function();
void duckdb_destroy_table_function(duckdb_table_function *table_function);
void duckdb_table_function_set_name(duckdb_table_function table_function, const char *name);
void duckdb_table_function_add_parameter(duckdb_table_function table_function, duckdb_logical_type
type);
96
DuckDB Documentation
Table Function
Syntax
duckdb_table_function duckdb_create_table_function(
);
97
DuckDB Documentation
Parameters
• returns
Syntax
void duckdb_destroy_table_function(
duckdb_table_function *table_function
);
Parameters
• table_function
Syntax
void duckdb_table_function_set_name(
duckdb_table_function table_function,
const char *name
);
Parameters
• table_function
• name
Syntax
void duckdb_table_function_add_parameter(
duckdb_table_function table_function,
duckdb_logical_type type
);
Parameters
• table_function
• type
98
DuckDB Documentation
Syntax
void duckdb_table_function_add_named_parameter(
duckdb_table_function table_function,
const char *name,
duckdb_logical_type type
);
Parameters
• table_function
• name
• type
duckdb_table_function_set_extra_info Assigns extra information to the table function that can be fetched during binding,
etc.
Syntax
void duckdb_table_function_set_extra_info(
duckdb_table_function table_function,
void *extra_info,
duckdb_delete_callback_t destroy
);
Parameters
• table_function
• extra_info
• destroy
The callback that will be called to destroy the bind data (if any)
Syntax
void duckdb_table_function_set_bind(
duckdb_table_function table_function,
duckdb_table_function_bind_t bind
);
99
DuckDB Documentation
Parameters
• table_function
• bind
Syntax
void duckdb_table_function_set_init(
duckdb_table_function table_function,
duckdb_table_function_init_t init
);
Parameters
• table_function
• init
Syntax
void duckdb_table_function_set_local_init(
duckdb_table_function table_function,
duckdb_table_function_init_t init
);
Parameters
• table_function
• init
Syntax
void duckdb_table_function_set_function(
duckdb_table_function table_function,
duckdb_table_function_t function
);
100
DuckDB Documentation
Parameters
• table_function
• function
The function
duckdb_table_function_supports_projection_pushdown Sets whether or not the given table function supports projec‑
tion pushdown.
If this is set to true, the system will provide a list of all required columns in the init stage through the duckdb_init_get_column_
count and duckdb_init_get_column_index functions. If this is set to false (the default), the system will expect all columns to be
projected.
Syntax
void duckdb_table_function_supports_projection_pushdown(
duckdb_table_function table_function,
bool pushdown
);
Parameters
• table_function
• pushdown
duckdb_register_table_function Register the table function object within the given connection.
The function requires at least a name, a bind function, an init function and a main function.
If the function is incomplete or a function with this name already exists DuckDBError is returned.
Syntax
duckdb_state duckdb_register_table_function(
duckdb_connection con,
duckdb_table_function function
);
Parameters
• con
• function
• returns
101
DuckDB Documentation
Syntax
void *duckdb_bind_get_extra_info(
duckdb_bind_info info
);
Parameters
• info
• returns
Syntax
void duckdb_bind_add_result_column(
duckdb_bind_info info,
const char *name,
duckdb_logical_type type
);
Parameters
• info
• name
• type
Syntax
idx_t duckdb_bind_get_parameter_count(
duckdb_bind_info info
);
Parameters
• info
• returns
102
DuckDB Documentation
Syntax
duckdb_value duckdb_bind_get_parameter(
duckdb_bind_info info,
idx_t index
);
Parameters
• info
• index
• returns
Syntax
duckdb_value duckdb_bind_get_named_parameter(
duckdb_bind_info info,
const char *name
);
Parameters
• info
• name
• returns
duckdb_bind_set_bind_data Sets the user‑provided bind data in the bind object. This object can be retrieved again during exe‑
cution.
Syntax
void duckdb_bind_set_bind_data(
duckdb_bind_info info,
void *bind_data,
duckdb_delete_callback_t destroy
);
103
DuckDB Documentation
Parameters
• info
• extra_data
• destroy
The callback that will be called to destroy the bind data (if any)
duckdb_bind_set_cardinality Sets the cardinality estimate for the table function, used for optimization.
Syntax
void duckdb_bind_set_cardinality(
duckdb_bind_info info,
idx_t cardinality,
bool is_exact
);
Parameters
• info
• is_exact
Syntax
void duckdb_bind_set_error(
duckdb_bind_info info,
const char *error
);
Parameters
• info
• error
104
DuckDB Documentation
Syntax
void *duckdb_init_get_extra_info(
duckdb_init_info info
);
Parameters
• info
• returns
duckdb_init_get_bind_data Gets the bind data set by duckdb_bind_set_bind_data during the bind.
Note that the bind data should be considered as read‑only. For tracking state, use the init data instead.
Syntax
void *duckdb_init_get_bind_data(
duckdb_init_info info
);
Parameters
• info
• returns
duckdb_init_set_init_data Sets the user‑provided init data in the init object. This object can be retrieved again during execu‑
tion.
Syntax
void duckdb_init_set_init_data(
duckdb_init_info info,
void *init_data,
duckdb_delete_callback_t destroy
);
Parameters
• info
• extra_data
• destroy
The callback that will be called to destroy the init data (if any)
105
DuckDB Documentation
This function must be used if projection pushdown is enabled to figure out which columns to emit.
Syntax
idx_t duckdb_init_get_column_count(
duckdb_init_info info
);
Parameters
• info
• returns
duckdb_init_get_column_index Returns the column index of the projected column at the specified position.
This function must be used if projection pushdown is enabled to figure out which columns to emit.
Syntax
idx_t duckdb_init_get_column_index(
duckdb_init_info info,
idx_t column_index
);
Parameters
• info
• column_index
The index at which to get the projected column index, from 0..duckdb_init_get_column_count(info)
• returns
duckdb_init_set_max_threads Sets how many threads can process this table function in parallel (default: 1)
Syntax
void duckdb_init_set_max_threads(
duckdb_init_info info,
idx_t max_threads
);
106
DuckDB Documentation
Parameters
• info
• max_threads
The maximum amount of threads that can process this table function
Syntax
void duckdb_init_set_error(
duckdb_init_info info,
const char *error
);
Parameters
• info
• error
Syntax
void *duckdb_function_get_extra_info(
duckdb_function_info info
);
Parameters
• info
• returns
duckdb_function_get_bind_data Gets the bind data set by duckdb_bind_set_bind_data during the bind.
Note that the bind data should be considered as read‑only. For tracking state, use the init data instead.
Syntax
void *duckdb_function_get_bind_data(
duckdb_function_info info
);
107
DuckDB Documentation
Parameters
• info
• returns
duckdb_function_get_init_data Gets the init data set by duckdb_init_set_init_data during the init.
Syntax
void *duckdb_function_get_init_data(
duckdb_function_info info
);
Parameters
• info
• returns
duckdb_function_get_local_init_data Gets the thread‑local init data set by duckdb_init_set_init_data during the
local_init.
Syntax
void *duckdb_function_get_local_init_data(
duckdb_function_info info
);
Parameters
• info
• returns
duckdb_function_set_error Report that an error has occurred while executing the function.
Syntax
void duckdb_function_set_error(
duckdb_function_info info,
const char *error
);
108
DuckDB Documentation
Parameters
• info
• error
Replacement Scans
The replacement scan API can be used to register a callback that is called when a table is read that does not exist in the catalog. For example,
when a query such as SELECT * FROM my_table is executed and my_table does not exist, the replacement scan callback will be
called with my_table as parameter. The replacement scan can then insert a table function with a specific parameter to replace the read
of the table.
API Reference
Syntax
void duckdb_add_replacement_scan(
duckdb_database db,
duckdb_replacement_callback_t replacement,
void *extra_data,
duckdb_delete_callback_t delete_callback
);
Parameters
• db
• replacement
• extra_data
• delete_callback
duckdb_replacement_scan_set_function_name Sets the replacement function name. If this function is called in the replace‑
ment callback, the replacement scan is performed. If it is not called, the replacement callback is not performed.
109
DuckDB Documentation
Syntax
void duckdb_replacement_scan_set_function_name(
duckdb_replacement_scan_info info,
const char *function_name
);
Parameters
• info
• function_name
Syntax
void duckdb_replacement_scan_add_parameter(
duckdb_replacement_scan_info info,
duckdb_value parameter
);
Parameters
• info
• parameter
duckdb_replacement_scan_set_error Report that an error has occurred while executing the replacement scan.
Syntax
void duckdb_replacement_scan_set_error(
duckdb_replacement_scan_info info,
const char *error
);
Parameters
• info
• error
110
DuckDB Documentation
Complete API
API Reference
Open/Connect
duckdb_state duckdb_open(const char *path, duckdb_database *out_database);
duckdb_state duckdb_open_ext(const char *path, duckdb_database *out_database, duckdb_config config, char
**out_error);
void duckdb_close(duckdb_database *database);
duckdb_state duckdb_connect(duckdb_database database, duckdb_connection *out_connection);
void duckdb_interrupt(duckdb_connection connection);
duckdb_query_progress_type duckdb_query_progress(duckdb_connection connection);
void duckdb_disconnect(duckdb_connection *connection);
const char *duckdb_library_version();
Configuration
duckdb_state duckdb_create_config(duckdb_config *out_config);
size_t duckdb_config_count();
duckdb_state duckdb_get_config_flag(size_t index, const char **out_name, const char **out_description);
duckdb_state duckdb_set_config(duckdb_config config, const char *name, const char *option);
void duckdb_destroy_config(duckdb_config *config);
Query Execution
duckdb_state duckdb_query(duckdb_connection connection, const char *query, duckdb_result *out_result);
void duckdb_destroy_result(duckdb_result *result);
const char *duckdb_column_name(duckdb_result *result, idx_t col);
duckdb_type duckdb_column_type(duckdb_result *result, idx_t col);
duckdb_statement_type duckdb_result_statement_type(duckdb_result result);
duckdb_logical_type duckdb_column_logical_type(duckdb_result *result, idx_t col);
idx_t duckdb_column_count(duckdb_result *result);
idx_t duckdb_row_count(duckdb_result *result);
idx_t duckdb_rows_changed(duckdb_result *result);
void *duckdb_column_data(duckdb_result *result, idx_t col);
bool *duckdb_nullmask_data(duckdb_result *result, idx_t col);
const char *duckdb_result_error(duckdb_result *result);
Result Functions
duckdb_data_chunk duckdb_result_get_chunk(duckdb_result result, idx_t chunk_index);
bool duckdb_result_is_streaming(duckdb_result result);
idx_t duckdb_result_chunk_count(duckdb_result result);
duckdb_result_type duckdb_result_return_type(duckdb_result result);
111
DuckDB Documentation
Helpers
void *duckdb_malloc(size_t size);
void duckdb_free(void *ptr);
idx_t duckdb_vector_size();
bool duckdb_string_is_inlined(duckdb_string_t string);
Date/Time/Timestamp Helpers
duckdb_date_struct duckdb_from_date(duckdb_date date);
duckdb_date duckdb_to_date(duckdb_date_struct date);
bool duckdb_is_finite_date(duckdb_date date);
duckdb_time_struct duckdb_from_time(duckdb_time time);
duckdb_time_tz duckdb_create_time_tz(int64_t micros, int32_t offset);
duckdb_time_tz_struct duckdb_from_time_tz(duckdb_time_tz micros);
duckdb_time duckdb_to_time(duckdb_time_struct time);
duckdb_timestamp_struct duckdb_from_timestamp(duckdb_timestamp ts);
duckdb_timestamp duckdb_to_timestamp(duckdb_timestamp_struct ts);
bool duckdb_is_finite_timestamp(duckdb_timestamp ts);
Hugeint Helpers
double duckdb_hugeint_to_double(duckdb_hugeint val);
duckdb_hugeint duckdb_double_to_hugeint(double val);
Decimal Helpers
duckdb_decimal duckdb_double_to_decimal(double val, uint8_t width, uint8_t scale);
double duckdb_decimal_to_double(duckdb_decimal val);
Prepared Statements
duckdb_state duckdb_prepare(duckdb_connection connection, const char *query, duckdb_prepared_statement
*out_prepared_statement);
void duckdb_destroy_prepare(duckdb_prepared_statement *prepared_statement);
const char *duckdb_prepare_error(duckdb_prepared_statement prepared_statement);
idx_t duckdb_nparams(duckdb_prepared_statement prepared_statement);
112
DuckDB Documentation
113
DuckDB Documentation
Extract Statements
idx_t duckdb_extract_statements(duckdb_connection connection, const char *query, duckdb_extracted_
statements *out_extracted_statements);
duckdb_state duckdb_prepare_extracted_statement(duckdb_connection connection, duckdb_extracted_
statements extracted_statements, idx_t index, duckdb_prepared_statement *out_prepared_statement);
const char *duckdb_extract_statements_error(duckdb_extracted_statements extracted_statements);
void duckdb_destroy_extracted(duckdb_extracted_statements *extracted_statements);
Value Interface
void duckdb_destroy_value(duckdb_value *value);
duckdb_value duckdb_create_varchar(const char *text);
duckdb_value duckdb_create_varchar_length(const char *text, idx_t length);
duckdb_value duckdb_create_int64(int64_t val);
duckdb_value duckdb_create_struct_value(duckdb_logical_type type, duckdb_value *values);
duckdb_value duckdb_create_list_value(duckdb_logical_type type, duckdb_value *values, idx_t value_
count);
duckdb_value duckdb_create_array_value(duckdb_logical_type type, duckdb_value *values, idx_t value_
count);
char *duckdb_get_varchar(duckdb_value value);
int64_t duckdb_get_int64(duckdb_value value);
114
DuckDB Documentation
Vector Interface
Table Functions
duckdb_table_function duckdb_create_table_function();
void duckdb_destroy_table_function(duckdb_table_function *table_function);
void duckdb_table_function_set_name(duckdb_table_function table_function, const char *name);
void duckdb_table_function_add_parameter(duckdb_table_function table_function, duckdb_logical_type
type);
115
DuckDB Documentation
Table Function
Replacement Scans
116
DuckDB Documentation
Appender
Arrow Interface
117
DuckDB Documentation
Threading Information
void duckdb_execute_tasks(duckdb_database database, idx_t max_tasks);
duckdb_task_state duckdb_create_task_state(duckdb_database database);
void duckdb_execute_tasks_state(duckdb_task_state state);
idx_t duckdb_execute_n_tasks_state(duckdb_task_state state, idx_t max_tasks);
void duckdb_finish_execution(duckdb_task_state state);
bool duckdb_task_state_is_finished(duckdb_task_state state);
void duckdb_destroy_task_state(duckdb_task_state state);
bool duckdb_execution_is_finished(duckdb_connection con);
duckdb_open Creates a new database or opens an existing database file stored at the given path. If no path is given a new in‑memory
database is created instead. The instantiated database should be closed with 'duckdb_close'.
Syntax
duckdb_state duckdb_open(
const char *path,
duckdb_database *out_database
);
Parameters
• path
Path to the database file on disk, or nullptr or :memory: to open an in‑memory database.
• out_database
• returns
duckdb_open_ext Extended version of duckdb_open. Creates a new database or opens an existing database file stored at the given
path. The instantiated database should be closed with 'duckdb_close'.
Syntax
duckdb_state duckdb_open_ext(
const char *path,
duckdb_database *out_database,
duckdb_config config,
char **out_error
);
Parameters
• path
Path to the database file on disk, or nullptr or :memory: to open an in‑memory database.
• out_database
118
DuckDB Documentation
• config
• out_error
If set and the function returns DuckDBError, this will contain the reason why the start‑up failed. Note that the error must be freed using
duckdb_free.
• returns
duckdb_close Closes the specified database and de‑allocates all memory allocated for that database. This should be called after you
are done with any database allocated through duckdb_open or duckdb_open_ext. Note that failing to call duckdb_close (in case
of e.g., a program crash) will not cause data corruption. Still, it is recommended to always correctly close a database object after you are
done with it.
Syntax
void duckdb_close(
duckdb_database *database
);
Parameters
• database
duckdb_connect Opens a connection to a database. Connections are required to query the database, and store transactional state
associated with the connection. The instantiated connection should be closed using 'duckdb_disconnect'.
Syntax
duckdb_state duckdb_connect(
duckdb_database database,
duckdb_connection *out_connection
);
Parameters
• database
• out_connection
• returns
119
DuckDB Documentation
Syntax
void duckdb_interrupt(
duckdb_connection connection
);
Parameters
• connection
Syntax
duckdb_query_progress_type duckdb_query_progress(
duckdb_connection connection
);
Parameters
• connection
• returns
duckdb_disconnect Closes the specified connection and de‑allocates all memory allocated for that connection.
Syntax
void duckdb_disconnect(
duckdb_connection *connection
);
Parameters
• connection
duckdb_library_version Returns the version of the linked DuckDB, with a version postfix for dev versions
Usually used for developing C extensions that must return this for a compatibility check.
Syntax
);
120
DuckDB Documentation
duckdb_create_config Initializes an empty configuration object that can be used to provide start‑up options for the DuckDB in‑
stance through duckdb_open_ext. The duckdb_config must be destroyed using 'duckdb_destroy_config'
Syntax
duckdb_state duckdb_create_config(
duckdb_config *out_config
);
Parameters
• out_config
• returns
duckdb_config_count This returns the total amount of configuration options available for usage with duckdb_get_config_
flag.
This should not be called in a loop as it internally loops over all the options.
Syntax
size_t duckdb_config_count(
);
Parameters
• returns
duckdb_get_config_flag Obtains a human‑readable name and description of a specific configuration option. This can be used to
e.g. display configuration options. This will succeed unless index is out of range (i.e., >= duckdb_config_count).
Syntax
duckdb_state duckdb_get_config_flag(
size_t index,
const char **out_name,
const char **out_description
);
121
DuckDB Documentation
Parameters
• index
• out_name
• out_description
• returns
duckdb_set_config Sets the specified option for the specified configuration. The configuration option is indicated by name. To
obtain a list of config options, see duckdb_get_config_flag.
This can fail if either the name is invalid, or if the value provided for the option is invalid.
Syntax
duckdb_state duckdb_set_config(
duckdb_config config,
const char *name,
const char *option
);
Parameters
• duckdb_config
• name
• option
• returns
duckdb_destroy_config Destroys the specified configuration object and de‑allocates all memory allocated for the object.
Syntax
void duckdb_destroy_config(
duckdb_config *config
);
122
DuckDB Documentation
Parameters
• config
duckdb_query Executes a SQL query within a connection and stores the full (materialized) result in the out_result pointer. If the query
fails to execute, DuckDBError is returned and the error message can be retrieved by calling duckdb_result_error.
Note that after running duckdb_query, duckdb_destroy_result must be called on the result object even if the query fails, other‑
wise the error stored within the result will not be freed correctly.
Syntax
duckdb_state duckdb_query(
duckdb_connection connection,
const char *query,
duckdb_result *out_result
);
Parameters
• connection
• query
• out_result
• returns
duckdb_destroy_result Closes the result and de‑allocates all memory allocated for that connection.
Syntax
void duckdb_destroy_result(
duckdb_result *result
);
Parameters
• result
duckdb_column_name Returns the column name of the specified column. The result should not need to be freed; the column names
will automatically be destroyed when the result is destroyed.
123
DuckDB Documentation
Syntax
const char *duckdb_column_name(
duckdb_result *result,
idx_t col
);
Parameters
• result
• col
• returns
Syntax
duckdb_type duckdb_column_type(
duckdb_result *result,
idx_t col
);
Parameters
• result
• col
• returns
duckdb_result_statement_type Returns the statement type of the statement that was executed
Syntax
duckdb_statement_type duckdb_result_statement_type(
duckdb_result result
);
Parameters
• result
• returns
124
DuckDB Documentation
Syntax
duckdb_logical_type duckdb_column_logical_type(
duckdb_result *result,
idx_t col
);
Parameters
• result
• col
• returns
Syntax
idx_t duckdb_column_count(
duckdb_result *result
);
Parameters
• result
• returns
Syntax
idx_t duckdb_row_count(
duckdb_result *result
);
Parameters
• result
• returns
125
DuckDB Documentation
duckdb_rows_changed Returns the number of rows changed by the query stored in the result. This is relevant only for IN‑
SERT/UPDATE/DELETE queries. For other queries the rows_changed will be 0.
Syntax
idx_t duckdb_rows_changed(
duckdb_result *result
);
Parameters
• result
• returns
The function returns a dense array which contains the result data. The exact type stored in the array depends on the corresponding duckdb_
type (as provided by duckdb_column_type). For the exact type by which the data should be accessed, see the comments in the types
section or the DUCKDB_TYPE enum.
For example, for a column of type DUCKDB_TYPE_INTEGER, rows can be accessed in the following manner:
Syntax
void *duckdb_column_data(
duckdb_result *result,
idx_t col
);
Parameters
• result
• col
• returns
126
DuckDB Documentation
Returns the nullmask of a specific column of a result in columnar format. The nullmask indicates for every row whether or not the corre‑
sponding row is NULL. If a row is NULL, the values present in the array provided by duckdb_column_data are undefined.
Syntax
bool *duckdb_nullmask_data(
duckdb_result *result,
idx_t col
);
Parameters
• result
• col
• returns
duckdb_result_error Returns the error message contained within the result. The error is only set if duckdb_query returns
DuckDBError.
The result of this function must not be freed. It will be cleaned up when duckdb_destroy_result is called.
Syntax
Parameters
• result
• returns
127
DuckDB Documentation
duckdb_result_get_chunk Fetches a data chunk from the duckdb_result. This function should be called repeatedly until the result
is exhausted.
This function supersedes all duckdb_value functions, as well as the duckdb_column_data and duckdb_nullmask_data func‑
tions. It results in significantly better performance, and should be preferred in newer code‑bases.
If this function is used, none of the other result functions can be used and vice versa (i.e., this function cannot be mixed with the legacy
result functions).
Use duckdb_result_chunk_count to figure out how many chunks there are in the result.
Syntax
duckdb_data_chunk duckdb_result_get_chunk(
duckdb_result result,
idx_t chunk_index
);
Parameters
• result
• chunk_index
• returns
The resulting data chunk. Returns NULL if the chunk index is out of bounds.
Syntax
bool duckdb_result_is_streaming(
duckdb_result result
);
Parameters
• result
• returns
Syntax
idx_t duckdb_result_chunk_count(
duckdb_result result
);
128
DuckDB Documentation
Parameters
• result
• returns
Syntax
duckdb_result_type duckdb_result_return_type(
duckdb_result result
);
Parameters
• result
• returns
The return_type
duckdb_value_boolean
Syntax
bool duckdb_value_boolean(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The boolean value at the specified location, or false if the value cannot be converted.
duckdb_value_int8
Syntax
int8_t duckdb_value_int8(
duckdb_result *result,
idx_t col,
idx_t row
);
129
DuckDB Documentation
Parameters
• returns
The int8_t value at the specified location, or 0 if the value cannot be converted.
duckdb_value_int16
Syntax
int16_t duckdb_value_int16(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The int16_t value at the specified location, or 0 if the value cannot be converted.
duckdb_value_int32
Syntax
int32_t duckdb_value_int32(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The int32_t value at the specified location, or 0 if the value cannot be converted.
duckdb_value_int64
Syntax
int64_t duckdb_value_int64(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The int64_t value at the specified location, or 0 if the value cannot be converted.
duckdb_value_hugeint
130
DuckDB Documentation
Syntax
duckdb_hugeint duckdb_value_hugeint(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The duckdb_hugeint value at the specified location, or 0 if the value cannot be converted.
duckdb_value_uhugeint
Syntax
duckdb_uhugeint duckdb_value_uhugeint(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The duckdb_uhugeint value at the specified location, or 0 if the value cannot be converted.
duckdb_value_decimal
Syntax
duckdb_decimal duckdb_value_decimal(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The duckdb_decimal value at the specified location, or 0 if the value cannot be converted.
duckdb_value_uint8
Syntax
uint8_t duckdb_value_uint8(
duckdb_result *result,
idx_t col,
idx_t row
);
131
DuckDB Documentation
Parameters
• returns
The uint8_t value at the specified location, or 0 if the value cannot be converted.
duckdb_value_uint16
Syntax
uint16_t duckdb_value_uint16(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The uint16_t value at the specified location, or 0 if the value cannot be converted.
duckdb_value_uint32
Syntax
uint32_t duckdb_value_uint32(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The uint32_t value at the specified location, or 0 if the value cannot be converted.
duckdb_value_uint64
Syntax
uint64_t duckdb_value_uint64(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The uint64_t value at the specified location, or 0 if the value cannot be converted.
duckdb_value_float
132
DuckDB Documentation
Syntax
float duckdb_value_float(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The float value at the specified location, or 0 if the value cannot be converted.
duckdb_value_double
Syntax
double duckdb_value_double(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The double value at the specified location, or 0 if the value cannot be converted.
duckdb_value_date
Syntax
duckdb_date duckdb_value_date(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The duckdb_date value at the specified location, or 0 if the value cannot be converted.
duckdb_value_time
Syntax
duckdb_time duckdb_value_time(
duckdb_result *result,
idx_t col,
idx_t row
);
133
DuckDB Documentation
Parameters
• returns
The duckdb_time value at the specified location, or 0 if the value cannot be converted.
duckdb_value_timestamp
Syntax
duckdb_timestamp duckdb_value_timestamp(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The duckdb_timestamp value at the specified location, or 0 if the value cannot be converted.
duckdb_value_interval
Syntax
duckdb_interval duckdb_value_interval(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The duckdb_interval value at the specified location, or 0 if the value cannot be converted.
duckdb_value_varchar
Syntax
char *duckdb_value_varchar(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• DEPRECATED
use duckdb_value_string instead. This function does not work correctly if the string contains null bytes.
• returns
The text value at the specified location as a null‑terminated string, or nullptr if the value cannot be converted. The result must be freed
with duckdb_free.
134
DuckDB Documentation
duckdb_value_string
Syntax
duckdb_string duckdb_value_string(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
duckdb_value_varchar_internal
Syntax
char *duckdb_value_varchar_internal(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• DEPRECATED
use duckdb_value_string_internal instead. This function does not work correctly if the string contains null bytes.
• returns
The char* value at the specified location. ONLY works on VARCHAR columns and does not auto‑cast. If the column is NOT a VARCHAR
column this function will return NULL.
duckdb_value_string_internal
Syntax
duckdb_string duckdb_value_string_internal(
duckdb_result *result,
idx_t col,
idx_t row
);
135
DuckDB Documentation
Parameters
• DEPRECATED
use duckdb_value_string_internal instead. This function does not work correctly if the string contains null bytes.
• returns
The char* value at the specified location. ONLY works on VARCHAR columns and does not auto‑cast. If the column is NOT a VARCHAR
column this function will return NULL.
duckdb_value_blob
Syntax
duckdb_blob duckdb_value_blob(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
The duckdb_blob value at the specified location. Returns a blob with blob.data set to nullptr if the value cannot be converted. The resulting
field ”blob.data” must be freed with duckdb_free.
duckdb_value_is_null
Syntax
bool duckdb_value_is_null(
duckdb_result *result,
idx_t col,
idx_t row
);
Parameters
• returns
Returns true if the value at the specified index is NULL, and false otherwise.
duckdb_malloc Allocate size bytes of memory using the duckdb internal malloc function. Any memory allocated in this manner
should be freed using duckdb_free.
Syntax
void *duckdb_malloc(
size_t size
);
136
DuckDB Documentation
Parameters
• size
• returns
Syntax
void duckdb_free(
void *ptr
);
Parameters
• ptr
duckdb_vector_size The internal vector size used by DuckDB. This is the amount of tuples that will fit into a data chunk created by
duckdb_create_data_chunk.
Syntax
idx_t duckdb_vector_size(
);
Parameters
• returns
duckdb_string_is_inlined Whether or not the duckdb_string_t value is inlined. This means that the data of the string does not
have a separate allocation.
Syntax
bool duckdb_string_is_inlined(
duckdb_string_t string
);
duckdb_from_date Decompose a duckdb_date object into year, month and date (stored as duckdb_date_struct).
Syntax
duckdb_date_struct duckdb_from_date(
duckdb_date date
);
137
DuckDB Documentation
Parameters
• date
• returns
Syntax
duckdb_date duckdb_to_date(
duckdb_date_struct date
);
Parameters
• date
• returns
Syntax
bool duckdb_is_finite_date(
duckdb_date date
);
Parameters
• date
• returns
duckdb_from_time Decompose a duckdb_time object into hour, minute, second and microsecond (stored as duckdb_time_
struct).
Syntax
duckdb_time_struct duckdb_from_time(
duckdb_time time
);
138
DuckDB Documentation
Parameters
• time
• returns
Syntax
duckdb_time_tz duckdb_create_time_tz(
int64_t micros,
int32_t offset
);
Parameters
• micros
• offset
• returns
Use duckdb_from_time to further decompose the micros into hour, minute, second and microsecond.
Syntax
duckdb_time_tz_struct duckdb_from_time_tz(
duckdb_time_tz micros
);
Parameters
• micros
• out_micros
• out_offset
duckdb_to_time Re‑compose a duckdb_time from hour, minute, second and microsecond (duckdb_time_struct).
139
DuckDB Documentation
Syntax
duckdb_time duckdb_to_time(
duckdb_time_struct time
);
Parameters
• time
• returns
Syntax
duckdb_timestamp_struct duckdb_from_timestamp(
duckdb_timestamp ts
);
Parameters
• ts
• returns
Syntax
duckdb_timestamp duckdb_to_timestamp(
duckdb_timestamp_struct ts
);
Parameters
• ts
• returns
Syntax
bool duckdb_is_finite_timestamp(
duckdb_timestamp ts
);
140
DuckDB Documentation
Parameters
• ts
• returns
duckdb_hugeint_to_double Converts a duckdb_hugeint object (as obtained from a DUCKDB_TYPE_HUGEINT column) into a
double.
Syntax
double duckdb_hugeint_to_double(
duckdb_hugeint val
);
Parameters
• val
• returns
If the conversion fails because the double value is too big the result will be 0.
Syntax
duckdb_hugeint duckdb_double_to_hugeint(
double val
);
Parameters
• val
• returns
duckdb_uhugeint_to_double Converts a duckdb_uhugeint object (as obtained from a DUCKDB_TYPE_UHUGEINT column) into
a double.
Syntax
double duckdb_uhugeint_to_double(
duckdb_uhugeint val
);
141
DuckDB Documentation
Parameters
• val
• returns
If the conversion fails because the double value is too big the result will be 0.
Syntax
duckdb_uhugeint duckdb_double_to_uhugeint(
double val
);
Parameters
• val
• returns
If the conversion fails because the double value is too big, or the width/scale are invalid the result will be 0.
Syntax
duckdb_decimal duckdb_double_to_decimal(
double val,
uint8_t width,
uint8_t scale
);
Parameters
• val
• returns
duckdb_decimal_to_double Converts a duckdb_decimal object (as obtained from a DUCKDB_TYPE_DECIMAL column) into a
double.
Syntax
double duckdb_decimal_to_double(
duckdb_decimal val
);
142
DuckDB Documentation
Parameters
• val
• returns
Note that after calling duckdb_prepare, the prepared statement should always be destroyed using duckdb_destroy_prepare,
even if the prepare fails.
If the prepare fails, duckdb_prepare_error can be called to obtain the reason why the prepare failed.
Syntax
duckdb_state duckdb_prepare(
duckdb_connection connection,
const char *query,
duckdb_prepared_statement *out_prepared_statement
);
Parameters
• connection
• query
• out_prepared_statement
• returns
duckdb_destroy_prepare Closes the prepared statement and de‑allocates all memory allocated for the statement.
Syntax
void duckdb_destroy_prepare(
duckdb_prepared_statement *prepared_statement
);
Parameters
• prepared_statement
duckdb_prepare_error Returns the error message associated with the given prepared statement. If the prepared statement has no
error message, this returns nullptr instead.
The error message should not be freed. It will be de‑allocated when duckdb_destroy_prepare is called.
143
DuckDB Documentation
Syntax
Parameters
• prepared_statement
• returns
duckdb_nparams Returns the number of parameters that can be provided to the given prepared statement.
Syntax
idx_t duckdb_nparams(
duckdb_prepared_statement prepared_statement
);
Parameters
• prepared_statement
duckdb_parameter_name Returns the name used to identify the parameter The returned string should be freed using duckdb_
free.
Returns NULL if the index is out of range for the provided prepared statement.
Syntax
Parameters
• prepared_statement
The prepared statement for which to get the parameter name from.
duckdb_param_type Returns the parameter type for the parameter at the given index.
Returns DUCKDB_TYPE_INVALID if the parameter index is out of range or the statement was not successfully prepared.
144
DuckDB Documentation
Syntax
duckdb_type duckdb_param_type(
duckdb_prepared_statement prepared_statement,
idx_t param_idx
);
Parameters
• prepared_statement
• param_idx
• returns
Syntax
duckdb_state duckdb_clear_bindings(
duckdb_prepared_statement prepared_statement
);
Syntax
duckdb_statement_type duckdb_prepared_statement_type(
duckdb_prepared_statement statement
);
Parameters
• statement
• returns
Syntax
duckdb_state duckdb_bind_value(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_value val
);
145
DuckDB Documentation
duckdb_bind_parameter_index Retrieve the index of the parameter for the prepared statement, identified by name
Syntax
duckdb_state duckdb_bind_parameter_index(
duckdb_prepared_statement prepared_statement,
idx_t *param_idx_out,
const char *name
);
duckdb_bind_boolean Binds a bool value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_boolean(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
bool val
);
duckdb_bind_int8 Binds an int8_t value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_int8(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
int8_t val
);
duckdb_bind_int16 Binds an int16_t value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_int16(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
int16_t val
);
duckdb_bind_int32 Binds an int32_t value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_int32(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
int32_t val
);
duckdb_bind_int64 Binds an int64_t value to the prepared statement at the specified index.
146
DuckDB Documentation
Syntax
duckdb_state duckdb_bind_int64(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
int64_t val
);
duckdb_bind_hugeint Binds a duckdb_hugeint value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_hugeint(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_hugeint val
);
duckdb_bind_uhugeint Binds an duckdb_uhugeint value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_uhugeint(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_uhugeint val
);
duckdb_bind_decimal Binds a duckdb_decimal value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_decimal(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_decimal val
);
duckdb_bind_uint8 Binds an uint8_t value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_uint8(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
uint8_t val
);
duckdb_bind_uint16 Binds an uint16_t value to the prepared statement at the specified index.
147
DuckDB Documentation
Syntax
duckdb_state duckdb_bind_uint16(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
uint16_t val
);
duckdb_bind_uint32 Binds an uint32_t value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_uint32(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
uint32_t val
);
duckdb_bind_uint64 Binds an uint64_t value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_uint64(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
uint64_t val
);
duckdb_bind_float Binds a float value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_float(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
float val
);
duckdb_bind_double Binds a double value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_double(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
double val
);
duckdb_bind_date Binds a duckdb_date value to the prepared statement at the specified index.
148
DuckDB Documentation
Syntax
duckdb_state duckdb_bind_date(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_date val
);
duckdb_bind_time Binds a duckdb_time value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_time(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_time val
);
duckdb_bind_timestamp Binds a duckdb_timestamp value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_timestamp(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_timestamp val
);
duckdb_bind_interval Binds a duckdb_interval value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_interval(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_interval val
);
duckdb_bind_varchar Binds a null‑terminated varchar value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_varchar(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
const char *val
);
duckdb_bind_varchar_length Binds a varchar value to the prepared statement at the specified index.
149
DuckDB Documentation
Syntax
duckdb_state duckdb_bind_varchar_length(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
const char *val,
idx_t length
);
duckdb_bind_blob Binds a blob value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_blob(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
const void *data,
idx_t length
);
duckdb_bind_null Binds a NULL value to the prepared statement at the specified index.
Syntax
duckdb_state duckdb_bind_null(
duckdb_prepared_statement prepared_statement,
idx_t param_idx
);
duckdb_execute_prepared Executes the prepared statement with the given bound parameters, and returns a materialized query
result.
This method can be called multiple times for each prepared statement, and the parameters can be modified between calls to this func‑
tion.
Syntax
duckdb_state duckdb_execute_prepared(
duckdb_prepared_statement prepared_statement,
duckdb_result *out_result
);
Parameters
• prepared_statement
• out_result
• returns
150
DuckDB Documentation
duckdb_execute_prepared_streaming Executes the prepared statement with the given bound parameters, and returns an
optionally‑streaming query result. To determine if the resulting query was in fact streamed, use duckdb_result_is_streaming
This method can be called multiple times for each prepared statement, and the parameters can be modified between calls to this func‑
tion.
Syntax
duckdb_state duckdb_execute_prepared_streaming(
duckdb_prepared_statement prepared_statement,
duckdb_result *out_result
);
Parameters
• prepared_statement
• out_result
• returns
duckdb_extract_statements Extract all statements from a query. Note that after calling duckdb_extract_statements, the
extracted statements should always be destroyed using duckdb_destroy_extracted, even if no statements were extracted.
If the extract fails, duckdb_extract_statements_error can be called to obtain the reason why the extract failed.
Syntax
idx_t duckdb_extract_statements(
duckdb_connection connection,
const char *query,
duckdb_extracted_statements *out_extracted_statements
);
Parameters
• connection
• query
• out_extracted_statements
• returns
151
DuckDB Documentation
If the prepare fails, duckdb_prepare_error can be called to obtain the reason why the prepare failed.
Syntax
duckdb_state duckdb_prepare_extracted_statement(
duckdb_connection connection,
duckdb_extracted_statements extracted_statements,
idx_t index,
duckdb_prepared_statement *out_prepared_statement
);
Parameters
• connection
• extracted_statements
• index
• out_prepared_statement
• returns
duckdb_extract_statements_error Returns the error message contained within the extracted statements. The result of this
function must not be freed. It will be cleaned up when duckdb_destroy_extracted is called.
Syntax
Parameters
• result
• returns
152
DuckDB Documentation
Syntax
void duckdb_destroy_extracted(
duckdb_extracted_statements *extracted_statements
);
Parameters
• extracted_statements
duckdb_pending_prepared Executes the prepared statement with the given bound parameters, and returns a pending result. The
pending result represents an intermediate structure for a query that is not yet fully executed. The pending result can be used to incremen‑
tally execute a query, returning control to the client between tasks.
Note that after calling duckdb_pending_prepared, the pending result should always be destroyed using duckdb_destroy_
pending, even if this function returns DuckDBError.
Syntax
duckdb_state duckdb_pending_prepared(
duckdb_prepared_statement prepared_statement,
duckdb_pending_result *out_result
);
Parameters
• prepared_statement
• out_result
• returns
duckdb_pending_prepared_streaming Executes the prepared statement with the given bound parameters, and returns a pend‑
ing result. This pending result will create a streaming duckdb_result when executed. The pending result represents an intermediate struc‑
ture for a query that is not yet fully executed.
Note that after calling duckdb_pending_prepared_streaming, the pending result should always be destroyed using duckdb_
destroy_pending, even if this function returns DuckDBError.
Syntax
duckdb_state duckdb_pending_prepared_streaming(
duckdb_prepared_statement prepared_statement,
duckdb_pending_result *out_result
);
153
DuckDB Documentation
Parameters
• prepared_statement
• out_result
• returns
duckdb_destroy_pending Closes the pending result and de‑allocates all memory allocated for the result.
Syntax
void duckdb_destroy_pending(
duckdb_pending_result *pending_result
);
Parameters
• pending_result
duckdb_pending_error Returns the error message contained within the pending result.
The result of this function must not be freed. It will be cleaned up when duckdb_destroy_pending is called.
Syntax
Parameters
• result
• returns
duckdb_pending_execute_task Executes a single task within the query, returning whether or not the query is ready.
If this returns DUCKDB_PENDING_RESULT_READY, the duckdb_execute_pending function can be called to obtain the result. If this returns
DUCKDB_PENDING_RESULT_NOT_READY, the duckdb_pending_execute_task function should be called again. If this returns DUCKDB_
PENDING_ERROR, an error occurred during execution.
154
DuckDB Documentation
Syntax
duckdb_pending_state duckdb_pending_execute_task(
duckdb_pending_result pending_result
);
Parameters
• pending_result
• returns
Syntax
duckdb_pending_state duckdb_pending_execute_check_state(
duckdb_pending_result pending_result
);
Parameters
• pending_result
• returns
duckdb_execute_pending Fully execute a pending query result, returning the final query result.
If duckdb_pending_execute_task has been called until DUCKDB_PENDING_RESULT_READY was returned, this will return fast. Otherwise,
all remaining tasks must be executed first.
Syntax
duckdb_state duckdb_execute_pending(
duckdb_pending_result pending_result,
duckdb_result *out_result
);
155
DuckDB Documentation
Parameters
• pending_result
• out_result
• returns
Syntax
bool duckdb_pending_execution_is_finished(
duckdb_pending_state pending_state
);
Parameters
• pending_state
• returns
duckdb_destroy_value Destroys the value and de‑allocates all memory allocated for that type.
Syntax
void duckdb_destroy_value(
duckdb_value *value
);
Parameters
• value
Syntax
duckdb_value duckdb_create_varchar(
const char *text
);
156
DuckDB Documentation
Parameters
• value
• returns
Syntax
duckdb_value duckdb_create_varchar_length(
const char *text,
idx_t length
);
Parameters
• value
The text
• length
• returns
Syntax
duckdb_value duckdb_create_int64(
int64_t val
);
Parameters
• value
• returns
Syntax
duckdb_value duckdb_create_struct_value(
duckdb_logical_type type,
duckdb_value *values
);
157
DuckDB Documentation
Parameters
• type
• values
• returns
duckdb_create_list_value Creates a list value from a type and an array of values of length value_count
Syntax
duckdb_value duckdb_create_list_value(
duckdb_logical_type type,
duckdb_value *values,
idx_t value_count
);
Parameters
• type
• values
• value_count
• returns
duckdb_create_array_value Creates a array value from a type and an array of values of length value_count
Syntax
duckdb_value duckdb_create_array_value(
duckdb_logical_type type,
duckdb_value *values,
idx_t value_count
);
Parameters
• type
• values
158
DuckDB Documentation
• value_count
• returns
duckdb_get_varchar Obtains a string representation of the given value. The result must be destroyed with duckdb_free.
Syntax
char *duckdb_get_varchar(
duckdb_value value
);
Parameters
• value
The value
• returns
Syntax
int64_t duckdb_get_int64(
duckdb_value value
);
Parameters
• value
The value
• returns
duckdb_create_logical_type Creates a duckdb_logical_type from a standard primitive type. The resulting type should be
destroyed with duckdb_destroy_logical_type.
Syntax
duckdb_logical_type duckdb_create_logical_type(
duckdb_type type
);
159
DuckDB Documentation
Parameters
• type
• returns
duckdb_logical_type_get_alias Returns the alias of a duckdb_logical_type, if one is set, else NULL. The result must be de‑
stroyed with duckdb_free.
Syntax
char *duckdb_logical_type_get_alias(
duckdb_logical_type type
);
Parameters
• type
• returns
duckdb_create_list_type Creates a list type from its child type. The resulting type should be destroyed with duckdb_destroy_
logical_type.
Syntax
duckdb_logical_type duckdb_create_list_type(
duckdb_logical_type type
);
Parameters
• type
• returns
duckdb_create_array_type Creates a array type from its child type. The resulting type should be destroyed with duckdb_
destroy_logical_type.
Syntax
duckdb_logical_type duckdb_create_array_type(
duckdb_logical_type type,
idx_t array_size
);
160
DuckDB Documentation
Parameters
• type
• array_size
• returns
duckdb_create_map_type Creates a map type from its key type and value type. The resulting type should be destroyed with
duckdb_destroy_logical_type.
Syntax
duckdb_logical_type duckdb_create_map_type(
duckdb_logical_type key_type,
duckdb_logical_type value_type
);
Parameters
• type
• returns
duckdb_create_union_type Creates a UNION type from the passed types array. The resulting type should be destroyed with
duckdb_destroy_logical_type.
Syntax
duckdb_logical_type duckdb_create_union_type(
duckdb_logical_type *member_types,
const char **member_names,
idx_t member_count
);
Parameters
• types
• type_amount
• returns
161
DuckDB Documentation
duckdb_create_struct_type Creates a STRUCT type from the passed member name and type arrays. The resulting type should
be destroyed with duckdb_destroy_logical_type.
Syntax
duckdb_logical_type duckdb_create_struct_type(
duckdb_logical_type *member_types,
const char **member_names,
idx_t member_count
);
Parameters
• member_types
• member_names
• member_count
• returns
duckdb_create_enum_type Creates an ENUM type from the passed member name array. The resulting type should be destroyed
with duckdb_destroy_logical_type.
Syntax
duckdb_logical_type duckdb_create_enum_type(
const char **member_names,
idx_t member_count
);
Parameters
• enum_name
• member_names
• member_count
• returns
duckdb_create_decimal_type Creates a duckdb_logical_type of type decimal with the specified width and scale. The re‑
sulting type should be destroyed with duckdb_destroy_logical_type.
162
DuckDB Documentation
Syntax
duckdb_logical_type duckdb_create_decimal_type(
uint8_t width,
uint8_t scale
);
Parameters
• width
• scale
• returns
Syntax
duckdb_type duckdb_get_type_id(
duckdb_logical_type type
);
Parameters
• type
• returns
The type id
Syntax
uint8_t duckdb_decimal_width(
duckdb_logical_type type
);
Parameters
• type
• returns
163
DuckDB Documentation
Syntax
uint8_t duckdb_decimal_scale(
duckdb_logical_type type
);
Parameters
• type
• returns
Syntax
duckdb_type duckdb_decimal_internal_type(
duckdb_logical_type type
);
Parameters
• type
• returns
Syntax
duckdb_type duckdb_enum_internal_type(
duckdb_logical_type type
);
Parameters
• type
• returns
Syntax
uint32_t duckdb_enum_dictionary_size(
duckdb_logical_type type
);
164
DuckDB Documentation
Parameters
• type
• returns
duckdb_enum_dictionary_value Retrieves the dictionary value at the specified position from the enum.
Syntax
char *duckdb_enum_dictionary_value(
duckdb_logical_type type,
idx_t index
);
Parameters
• type
• index
• returns
The string value of the enum type. Must be freed with duckdb_free.
Syntax
duckdb_logical_type duckdb_list_type_child_type(
duckdb_logical_type type
);
Parameters
• type
• returns
The child type of the list type. Must be destroyed with duckdb_destroy_logical_type.
165
DuckDB Documentation
Syntax
duckdb_logical_type duckdb_array_type_child_type(
duckdb_logical_type type
);
Parameters
• type
• returns
The child type of the array type. Must be destroyed with duckdb_destroy_logical_type.
Syntax
idx_t duckdb_array_type_array_size(
duckdb_logical_type type
);
Parameters
• type
• returns
The fixed number of elements the values of this array type can store.
Syntax
duckdb_logical_type duckdb_map_type_key_type(
duckdb_logical_type type
);
Parameters
• type
• returns
The key type of the map type. Must be destroyed with duckdb_destroy_logical_type.
166
DuckDB Documentation
Syntax
duckdb_logical_type duckdb_map_type_value_type(
duckdb_logical_type type
);
Parameters
• type
• returns
The value type of the map type. Must be destroyed with duckdb_destroy_logical_type.
Syntax
idx_t duckdb_struct_type_child_count(
duckdb_logical_type type
);
Parameters
• type
• returns
Syntax
char *duckdb_struct_type_child_name(
duckdb_logical_type type,
idx_t index
);
Parameters
• type
• index
• returns
167
DuckDB Documentation
duckdb_struct_type_child_type Retrieves the child type of the given struct type at the specified index.
Syntax
duckdb_logical_type duckdb_struct_type_child_type(
duckdb_logical_type type,
idx_t index
);
Parameters
• type
• index
• returns
The child type of the struct type. Must be destroyed with duckdb_destroy_logical_type.
duckdb_union_type_member_count Returns the number of members that the union type has.
Syntax
idx_t duckdb_union_type_member_count(
duckdb_logical_type type
);
Parameters
• type
• returns
Syntax
char *duckdb_union_type_member_name(
duckdb_logical_type type,
idx_t index
);
168
DuckDB Documentation
Parameters
• type
• index
• returns
duckdb_union_type_member_type Retrieves the child type of the given union member at the specified index.
Syntax
duckdb_logical_type duckdb_union_type_member_type(
duckdb_logical_type type,
idx_t index
);
Parameters
• type
• index
• returns
The child type of the union member. Must be destroyed with duckdb_destroy_logical_type.
duckdb_destroy_logical_type Destroys the logical type and de‑allocates all memory allocated for that type.
Syntax
void duckdb_destroy_logical_type(
duckdb_logical_type *type
);
Parameters
• type
169
DuckDB Documentation
Syntax
duckdb_data_chunk duckdb_create_data_chunk(
duckdb_logical_type *types,
idx_t column_count
);
Parameters
• types
• column_count
• returns
duckdb_destroy_data_chunk Destroys the data chunk and de‑allocates all memory allocated for that chunk.
Syntax
void duckdb_destroy_data_chunk(
duckdb_data_chunk *chunk
);
Parameters
• chunk
duckdb_data_chunk_reset Resets a data chunk, clearing the validity masks and setting the cardinality of the data chunk to 0.
Syntax
void duckdb_data_chunk_reset(
duckdb_data_chunk chunk
);
Parameters
• chunk
Syntax
idx_t duckdb_data_chunk_get_column_count(
duckdb_data_chunk chunk
);
170
DuckDB Documentation
Parameters
• chunk
• returns
duckdb_data_chunk_get_vector Retrieves the vector at the specified column index in the data chunk.
The pointer to the vector is valid for as long as the chunk is alive. It does NOT need to be destroyed.
Syntax
duckdb_vector duckdb_data_chunk_get_vector(
duckdb_data_chunk chunk,
idx_t col_idx
);
Parameters
• chunk
• returns
The vector
Syntax
idx_t duckdb_data_chunk_get_size(
duckdb_data_chunk chunk
);
Parameters
• chunk
• returns
Syntax
void duckdb_data_chunk_set_size(
duckdb_data_chunk chunk,
idx_t size
);
171
DuckDB Documentation
Parameters
• chunk
• size
Syntax
duckdb_logical_type duckdb_vector_get_column_type(
duckdb_vector vector
);
Parameters
• vector
• returns
The data pointer can be used to read or write values from the vector. How to read or write values depends on the type of the vector.
Syntax
void *duckdb_vector_get_data(
duckdb_vector vector
);
Parameters
• vector
• returns
The validity mask is a bitset that signifies null‑ness within the data chunk. It is a series of uint64_t values, where each uint64_t value contains
validity for 64 tuples. The bit is set to 1 if the value is valid (i.e., not NULL) or 0 if the value is invalid (i.e., NULL).
idx_t entry_idx = row_idx / 64; idx_t idx_in_entry = row_idx % 64; bool is_valid = validity_mask[entry_idx] & (1 « idx_in_entry);
172
DuckDB Documentation
Syntax
uint64_t *duckdb_vector_get_validity(
duckdb_vector vector
);
Parameters
• vector
• returns
After this function is called, duckdb_vector_get_validity will ALWAYS return non‑NULL. This allows null values to be written to the
vector, regardless of whether a validity mask was present before.
Syntax
void duckdb_vector_ensure_validity_writable(
duckdb_vector vector
);
Parameters
• vector
Syntax
void duckdb_vector_assign_string_element(
duckdb_vector vector,
idx_t index,
const char *str
);
Parameters
• vector
• index
• str
duckdb_vector_assign_string_element_len Assigns a string element in the vector at the specified location. You may also
use this function to assign BLOBs.
173
DuckDB Documentation
Syntax
void duckdb_vector_assign_string_element_len(
duckdb_vector vector,
idx_t index,
const char *str,
idx_t str_len
);
Parameters
• vector
• index
• str
The string
• str_len
Syntax
duckdb_vector duckdb_list_vector_get_child(
duckdb_vector vector
);
Parameters
• vector
The vector
• returns
Syntax
idx_t duckdb_list_vector_get_size(
duckdb_vector vector
);
174
DuckDB Documentation
Parameters
• vector
The vector
• returns
duckdb_list_vector_set_size Sets the total size of the underlying child‑vector of a list vector.
Syntax
duckdb_state duckdb_list_vector_set_size(
duckdb_vector vector,
idx_t size
);
Parameters
• vector
• size
• returns
Syntax
duckdb_state duckdb_list_vector_reserve(
duckdb_vector vector,
idx_t required_capacity
);
Parameters
• vector
• required_capacity
• return
175
DuckDB Documentation
Syntax
duckdb_vector duckdb_struct_vector_get_child(
duckdb_vector vector,
idx_t index
);
Parameters
• vector
The vector
• index
• returns
The resulting vector is valid as long as the parent vector is valid. The resulting vector has the size of the parent vector multiplied by the
array size.
Syntax
duckdb_vector duckdb_array_vector_get_child(
duckdb_vector vector
);
Parameters
• vector
The vector
• returns
duckdb_validity_row_is_valid Returns whether or not a row is valid (i.e., not NULL) in the given validity mask.
Syntax
bool duckdb_validity_row_is_valid(
uint64_t *validity,
idx_t row
);
Parameters
• validity
• row
176
DuckDB Documentation
• returns
Syntax
void duckdb_validity_set_row_validity(
uint64_t *validity,
idx_t row,
bool valid
);
Parameters
• validity
• row
• valid
Syntax
void duckdb_validity_set_row_invalid(
uint64_t *validity,
idx_t row
);
Parameters
• validity
• row
177
DuckDB Documentation
Syntax
void duckdb_validity_set_row_valid(
uint64_t *validity,
idx_t row
);
Parameters
• validity
• row
Syntax
duckdb_table_function duckdb_create_table_function(
);
Parameters
• returns
Syntax
void duckdb_destroy_table_function(
duckdb_table_function *table_function
);
Parameters
• table_function
Syntax
void duckdb_table_function_set_name(
duckdb_table_function table_function,
const char *name
);
178
DuckDB Documentation
Parameters
• table_function
• name
Syntax
void duckdb_table_function_add_parameter(
duckdb_table_function table_function,
duckdb_logical_type type
);
Parameters
• table_function
• type
Syntax
void duckdb_table_function_add_named_parameter(
duckdb_table_function table_function,
const char *name,
duckdb_logical_type type
);
Parameters
• table_function
• name
• type
duckdb_table_function_set_extra_info Assigns extra information to the table function that can be fetched during binding,
etc.
179
DuckDB Documentation
Syntax
void duckdb_table_function_set_extra_info(
duckdb_table_function table_function,
void *extra_info,
duckdb_delete_callback_t destroy
);
Parameters
• table_function
• extra_info
• destroy
The callback that will be called to destroy the bind data (if any)
Syntax
void duckdb_table_function_set_bind(
duckdb_table_function table_function,
duckdb_table_function_bind_t bind
);
Parameters
• table_function
• bind
Syntax
void duckdb_table_function_set_init(
duckdb_table_function table_function,
duckdb_table_function_init_t init
);
Parameters
• table_function
• init
180
DuckDB Documentation
Syntax
void duckdb_table_function_set_local_init(
duckdb_table_function table_function,
duckdb_table_function_init_t init
);
Parameters
• table_function
• init
Syntax
void duckdb_table_function_set_function(
duckdb_table_function table_function,
duckdb_table_function_t function
);
Parameters
• table_function
• function
The function
duckdb_table_function_supports_projection_pushdown Sets whether or not the given table function supports projec‑
tion pushdown.
If this is set to true, the system will provide a list of all required columns in the init stage through the duckdb_init_get_column_
count and duckdb_init_get_column_index functions. If this is set to false (the default), the system will expect all columns to be
projected.
Syntax
void duckdb_table_function_supports_projection_pushdown(
duckdb_table_function table_function,
bool pushdown
);
181
DuckDB Documentation
Parameters
• table_function
• pushdown
duckdb_register_table_function Register the table function object within the given connection.
The function requires at least a name, a bind function, an init function and a main function.
If the function is incomplete or a function with this name already exists DuckDBError is returned.
Syntax
duckdb_state duckdb_register_table_function(
duckdb_connection con,
duckdb_table_function function
);
Parameters
• con
• function
• returns
Syntax
void *duckdb_bind_get_extra_info(
duckdb_bind_info info
);
Parameters
• info
• returns
182
DuckDB Documentation
Syntax
void duckdb_bind_add_result_column(
duckdb_bind_info info,
const char *name,
duckdb_logical_type type
);
Parameters
• info
• name
• type
Syntax
idx_t duckdb_bind_get_parameter_count(
duckdb_bind_info info
);
Parameters
• info
• returns
Syntax
duckdb_value duckdb_bind_get_parameter(
duckdb_bind_info info,
idx_t index
);
Parameters
• info
• index
183
DuckDB Documentation
• returns
Syntax
duckdb_value duckdb_bind_get_named_parameter(
duckdb_bind_info info,
const char *name
);
Parameters
• info
• name
• returns
duckdb_bind_set_bind_data Sets the user‑provided bind data in the bind object. This object can be retrieved again during exe‑
cution.
Syntax
void duckdb_bind_set_bind_data(
duckdb_bind_info info,
void *bind_data,
duckdb_delete_callback_t destroy
);
Parameters
• info
• extra_data
• destroy
The callback that will be called to destroy the bind data (if any)
duckdb_bind_set_cardinality Sets the cardinality estimate for the table function, used for optimization.
184
DuckDB Documentation
Syntax
void duckdb_bind_set_cardinality(
duckdb_bind_info info,
idx_t cardinality,
bool is_exact
);
Parameters
• info
• is_exact
Syntax
void duckdb_bind_set_error(
duckdb_bind_info info,
const char *error
);
Parameters
• info
• error
Syntax
void *duckdb_init_get_extra_info(
duckdb_init_info info
);
Parameters
• info
• returns
duckdb_init_get_bind_data Gets the bind data set by duckdb_bind_set_bind_data during the bind.
Note that the bind data should be considered as read‑only. For tracking state, use the init data instead.
185
DuckDB Documentation
Syntax
void *duckdb_init_get_bind_data(
duckdb_init_info info
);
Parameters
• info
• returns
duckdb_init_set_init_data Sets the user‑provided init data in the init object. This object can be retrieved again during execu‑
tion.
Syntax
void duckdb_init_set_init_data(
duckdb_init_info info,
void *init_data,
duckdb_delete_callback_t destroy
);
Parameters
• info
• extra_data
• destroy
The callback that will be called to destroy the init data (if any)
This function must be used if projection pushdown is enabled to figure out which columns to emit.
Syntax
idx_t duckdb_init_get_column_count(
duckdb_init_info info
);
Parameters
• info
• returns
186
DuckDB Documentation
duckdb_init_get_column_index Returns the column index of the projected column at the specified position.
This function must be used if projection pushdown is enabled to figure out which columns to emit.
Syntax
idx_t duckdb_init_get_column_index(
duckdb_init_info info,
idx_t column_index
);
Parameters
• info
• column_index
The index at which to get the projected column index, from 0..duckdb_init_get_column_count(info)
• returns
duckdb_init_set_max_threads Sets how many threads can process this table function in parallel (default: 1)
Syntax
void duckdb_init_set_max_threads(
duckdb_init_info info,
idx_t max_threads
);
Parameters
• info
• max_threads
The maximum amount of threads that can process this table function
Syntax
void duckdb_init_set_error(
duckdb_init_info info,
const char *error
);
187
DuckDB Documentation
Parameters
• info
• error
Syntax
void *duckdb_function_get_extra_info(
duckdb_function_info info
);
Parameters
• info
• returns
duckdb_function_get_bind_data Gets the bind data set by duckdb_bind_set_bind_data during the bind.
Note that the bind data should be considered as read‑only. For tracking state, use the init data instead.
Syntax
void *duckdb_function_get_bind_data(
duckdb_function_info info
);
Parameters
• info
• returns
duckdb_function_get_init_data Gets the init data set by duckdb_init_set_init_data during the init.
Syntax
void *duckdb_function_get_init_data(
duckdb_function_info info
);
188
DuckDB Documentation
Parameters
• info
• returns
duckdb_function_get_local_init_data Gets the thread‑local init data set by duckdb_init_set_init_data during the
local_init.
Syntax
void *duckdb_function_get_local_init_data(
duckdb_function_info info
);
Parameters
• info
• returns
duckdb_function_set_error Report that an error has occurred while executing the function.
Syntax
void duckdb_function_set_error(
duckdb_function_info info,
const char *error
);
Parameters
• info
• error
Syntax
void duckdb_add_replacement_scan(
duckdb_database db,
duckdb_replacement_callback_t replacement,
void *extra_data,
duckdb_delete_callback_t delete_callback
);
189
DuckDB Documentation
Parameters
• db
• replacement
• extra_data
• delete_callback
duckdb_replacement_scan_set_function_name Sets the replacement function name. If this function is called in the replace‑
ment callback, the replacement scan is performed. If it is not called, the replacement callback is not performed.
Syntax
void duckdb_replacement_scan_set_function_name(
duckdb_replacement_scan_info info,
const char *function_name
);
Parameters
• info
• function_name
Syntax
void duckdb_replacement_scan_add_parameter(
duckdb_replacement_scan_info info,
duckdb_value parameter
);
Parameters
• info
• parameter
duckdb_replacement_scan_set_error Report that an error has occurred while executing the replacement scan.
190
DuckDB Documentation
Syntax
void duckdb_replacement_scan_set_error(
duckdb_replacement_scan_info info,
const char *error
);
Parameters
• info
• error
Syntax
duckdb_state duckdb_appender_create(
duckdb_connection connection,
const char *schema,
const char *table,
duckdb_appender *out_appender
);
Parameters
• connection
• schema
The schema of the table to append to, or nullptr for the default schema.
• table
• out_appender
• returns
duckdb_appender_column_count Returns the number of columns in the table that belongs to the appender.
Syntax
idx_t duckdb_appender_column_count(
duckdb_appender appender
);
191
DuckDB Documentation
Parameters
• returns
Syntax
duckdb_logical_type duckdb_appender_column_type(
duckdb_appender appender,
idx_t col_idx
);
Parameters
• returns
duckdb_appender_error Returns the error message associated with the given appender. If the appender has no error message, this
returns nullptr instead.
The error message should not be freed. It will be de‑allocated when duckdb_appender_destroy is called.
Syntax
const char *duckdb_appender_error(
duckdb_appender appender
);
Parameters
• appender
• returns
duckdb_appender_flush Flush the appender to the table, forcing the cache of the appender to be cleared and the data to be ap‑
pended to the base table.
This should generally not be used unless you know what you are doing. Instead, call duckdb_appender_destroy when you are done
with the appender.
Syntax
duckdb_state duckdb_appender_flush(
duckdb_appender appender
);
192
DuckDB Documentation
Parameters
• appender
• returns
duckdb_appender_close Close the appender, flushing all intermediate state in the appender to the table and closing it for further
appends.
Syntax
duckdb_state duckdb_appender_close(
duckdb_appender appender
);
Parameters
• appender
• returns
duckdb_appender_destroy Close the appender and destroy it. Flushing all intermediate state in the appender to the table, and
de‑allocating all memory associated with the appender.
Syntax
duckdb_state duckdb_appender_destroy(
duckdb_appender *appender
);
Parameters
• appender
• returns
duckdb_appender_begin_row A nop function, provided for backwards compatibility reasons. Does nothing. Only duckdb_
appender_end_row is required.
Syntax
duckdb_state duckdb_appender_begin_row(
duckdb_appender appender
);
193
DuckDB Documentation
duckdb_appender_end_row Finish the current row of appends. After end_row is called, the next row can be appended.
Syntax
duckdb_state duckdb_appender_end_row(
duckdb_appender appender
);
Parameters
• appender
The appender.
• returns
Syntax
duckdb_state duckdb_append_bool(
duckdb_appender appender,
bool value
);
Syntax
duckdb_state duckdb_append_int8(
duckdb_appender appender,
int8_t value
);
Syntax
duckdb_state duckdb_append_int16(
duckdb_appender appender,
int16_t value
);
Syntax
duckdb_state duckdb_append_int32(
duckdb_appender appender,
int32_t value
);
194
DuckDB Documentation
Syntax
duckdb_state duckdb_append_int64(
duckdb_appender appender,
int64_t value
);
Syntax
duckdb_state duckdb_append_hugeint(
duckdb_appender appender,
duckdb_hugeint value
);
Syntax
duckdb_state duckdb_append_uint8(
duckdb_appender appender,
uint8_t value
);
Syntax
duckdb_state duckdb_append_uint16(
duckdb_appender appender,
uint16_t value
);
Syntax
duckdb_state duckdb_append_uint32(
duckdb_appender appender,
uint32_t value
);
Syntax
duckdb_state duckdb_append_uint64(
duckdb_appender appender,
uint64_t value
);
195
DuckDB Documentation
Syntax
duckdb_state duckdb_append_uhugeint(
duckdb_appender appender,
duckdb_uhugeint value
);
Syntax
duckdb_state duckdb_append_float(
duckdb_appender appender,
float value
);
Syntax
duckdb_state duckdb_append_double(
duckdb_appender appender,
double value
);
Syntax
duckdb_state duckdb_append_date(
duckdb_appender appender,
duckdb_date value
);
Syntax
duckdb_state duckdb_append_time(
duckdb_appender appender,
duckdb_time value
);
Syntax
duckdb_state duckdb_append_timestamp(
duckdb_appender appender,
duckdb_timestamp value
);
196
DuckDB Documentation
Syntax
duckdb_state duckdb_append_interval(
duckdb_appender appender,
duckdb_interval value
);
Syntax
duckdb_state duckdb_append_varchar(
duckdb_appender appender,
const char *val
);
Syntax
duckdb_state duckdb_append_varchar_length(
duckdb_appender appender,
const char *val,
idx_t length
);
Syntax
duckdb_state duckdb_append_blob(
duckdb_appender appender,
const void *data,
idx_t length
);
Syntax
duckdb_state duckdb_append_null(
duckdb_appender appender
);
The types of the data chunk must exactly match the types of the table, no casting is performed. If the types do not match or the appender
is in an invalid state, DuckDBError is returned. If the append is successful, DuckDBSuccess is returned.
197
DuckDB Documentation
Syntax
duckdb_state duckdb_append_data_chunk(
duckdb_appender appender,
duckdb_data_chunk chunk
);
Parameters
• appender
• chunk
• returns
duckdb_query_arrow Executes a SQL query within a connection and stores the full (materialized) result in an arrow structure. If the
query fails to execute, DuckDBError is returned and the error message can be retrieved by calling duckdb_query_arrow_error.
Note that after running duckdb_query_arrow, duckdb_destroy_arrow must be called on the result object even if the query fails,
otherwise the error stored within the result will not be freed correctly.
Syntax
duckdb_state duckdb_query_arrow(
duckdb_connection connection,
const char *query,
duckdb_arrow *out_result
);
Parameters
• connection
• query
• out_result
• returns
duckdb_query_arrow_schema Fetch the internal arrow schema from the arrow result. Remember to call release on the respective
ArrowSchema object.
198
DuckDB Documentation
Syntax
duckdb_state duckdb_query_arrow_schema(
duckdb_arrow result,
duckdb_arrow_schema *out_schema
);
Parameters
• result
• out_schema
• returns
duckdb_prepared_arrow_schema Fetch the internal arrow schema from the prepared statement. Remember to call release on the
respective ArrowSchema object.
Syntax
duckdb_state duckdb_prepared_arrow_schema(
duckdb_prepared_statement prepared,
duckdb_arrow_schema *out_schema
);
Parameters
• result
• out_schema
• returns
duckdb_result_arrow_array Convert a data chunk into an arrow struct array. Remember to call release on the respective ArrowAr‑
ray object.
Syntax
void duckdb_result_arrow_array(
duckdb_result result,
duckdb_data_chunk chunk,
duckdb_arrow_array *out_array
);
199
DuckDB Documentation
Parameters
• result
The result object the data chunk have been fetched from.
• chunk
• out_array
duckdb_query_arrow_array Fetch an internal arrow struct array from the arrow result. Remember to call release on the respective
ArrowArray object.
This function can be called multiple time to get next chunks, which will free the previous out_array. So consume the out_array before
calling this function again.
Syntax
duckdb_state duckdb_query_arrow_array(
duckdb_arrow result,
duckdb_arrow_array *out_array
);
Parameters
• result
• out_array
• returns
duckdb_arrow_column_count Returns the number of columns present in the arrow result object.
Syntax
idx_t duckdb_arrow_column_count(
duckdb_arrow result
);
Parameters
• result
• returns
duckdb_arrow_row_count Returns the number of rows present in the arrow result object.
200
DuckDB Documentation
Syntax
idx_t duckdb_arrow_row_count(
duckdb_arrow result
);
Parameters
• result
• returns
duckdb_arrow_rows_changed Returns the number of rows changed by the query stored in the arrow result. This is relevant only
for INSERT/UPDATE/DELETE queries. For other queries the rows_changed will be 0.
Syntax
idx_t duckdb_arrow_rows_changed(
duckdb_arrow result
);
Parameters
• result
• returns
duckdb_query_arrow_error Returns the error message contained within the result. The error is only set if duckdb_query_
arrow returns DuckDBError.
The error message should not be freed. It will be de‑allocated when duckdb_destroy_arrow is called.
Syntax
Parameters
• result
• returns
duckdb_destroy_arrow Closes the result and de‑allocates all memory allocated for the arrow result.
201
DuckDB Documentation
Syntax
void duckdb_destroy_arrow(
duckdb_arrow *result
);
Parameters
• result
duckdb_destroy_arrow_stream Releases the arrow array stream and de‑allocates its memory.
Syntax
void duckdb_destroy_arrow_stream(
duckdb_arrow_stream *stream_p
);
Parameters
• stream
duckdb_execute_prepared_arrow Executes the prepared statement with the given bound parameters, and returns an arrow
query result. Note that after running duckdb_execute_prepared_arrow, duckdb_destroy_arrow must be called on the result
object.
Syntax
duckdb_state duckdb_execute_prepared_arrow(
duckdb_prepared_statement prepared_statement,
duckdb_arrow *out_result
);
Parameters
• prepared_statement
• out_result
• returns
duckdb_arrow_scan Scans the Arrow stream and creates a view with the given name.
202
DuckDB Documentation
Syntax
duckdb_state duckdb_arrow_scan(
duckdb_connection connection,
const char *table_name,
duckdb_arrow_stream arrow
);
Parameters
• connection
• table_name
• arrow
• returns
duckdb_arrow_array_scan Scans the Arrow array and creates a view with the given name. Note that after running duckdb_
arrow_array_scan, duckdb_destroy_arrow_stream must be called on the out stream.
Syntax
duckdb_state duckdb_arrow_array_scan(
duckdb_connection connection,
const char *table_name,
duckdb_arrow_schema arrow_schema,
duckdb_arrow_array arrow_array,
duckdb_arrow_stream *out_stream
);
Parameters
• connection
• table_name
• arrow_schema
• arrow_array
• out_stream
Output array stream that wraps around the passed schema, for releasing/deleting once done.
• returns
203
DuckDB Documentation
Will return after max_tasks have been executed, or if there are no more tasks present.
Syntax
void duckdb_execute_tasks(
duckdb_database database,
idx_t max_tasks
);
Parameters
• database
• max_tasks
duckdb_create_task_state Creates a task state that can be used with duckdb_execute_tasks_state to execute tasks until
duckdb_finish_execution is called on the state.
Syntax
duckdb_task_state duckdb_create_task_state(
duckdb_database database
);
Parameters
• database
• returns
The thread will keep on executing tasks forever, until duckdb_finish_execution is called on the state. Multiple threads can share the same
duckdb_task_state.
Syntax
void duckdb_execute_tasks_state(
duckdb_task_state state
);
Parameters
• state
204
DuckDB Documentation
The thread will keep on executing tasks until either duckdb_finish_execution is called on the state, max_tasks tasks have been executed or
there are no more tasks to be executed.
Syntax
idx_t duckdb_execute_n_tasks_state(
duckdb_task_state state,
idx_t max_tasks
);
Parameters
• state
• max_tasks
• returns
Syntax
void duckdb_finish_execution(
duckdb_task_state state
);
Parameters
• state
Syntax
bool duckdb_task_state_is_finished(
duckdb_task_state state
);
Parameters
• state
• returns
205
DuckDB Documentation
Note that this should not be called while there is an active duckdb_execute_tasks_state running on the task state.
Syntax
void duckdb_destroy_task_state(
duckdb_task_state state
);
Parameters
• state
Syntax
bool duckdb_execution_is_finished(
duckdb_connection con
);
Parameters
• con
duckdb_stream_fetch_chunk Fetches a data chunk from the (streaming) duckdb_result. This function should be called repeatedly
until the result is exhausted.
If this function is used, none of the other result functions can be used and vice versa (i.e., this function cannot be mixed with the legacy
result functions or the materialized result functions).
It is not known beforehand how many chunks will be returned by this result.
Syntax
duckdb_data_chunk duckdb_stream_fetch_chunk(
duckdb_result result
);
Parameters
• result
• returns
The resulting data chunk. Returns NULL if the result has an error.
206
DuckDB Documentation
C++ API
Installation
The DuckDB C++ API can be installed as part of the libduckdb packages. Please see the installation page for details.
DuckDB implements a custom C++ API. This is built around the abstractions of a database instance (DuckDB class), multiple Connections
to the database instance and QueryResult instances as the result of queries. The header file for the C++ API is duckdb.hpp.
Note. The standard source distribution of libduckdb contains an ”amalgamation” of the DuckDB sources, which combine all
sources into two files duckdb.hpp and duckdb.cpp. The duckdb.hpp header is much larger in this case. Regardless of whether
you are using the amalgamation or not, just include duckdb.hpp.
Startup & Shutdown To use DuckDB, you must first initialize a DuckDB instance using its constructor. DuckDB() takes as parameter
the database file to read and write from. The special value nullptr can be used to create an in‑memory database. Note that for an
in‑memory database no data is persisted to disk (i.e., all data is lost when you exit the process). The second parameter to the DuckDB
constructor is an optional DBConfig object. In DBConfig, you can set various database parameters, for example the read/write mode
or memory limits. The DuckDB constructor may throw exceptions, for example if the database file is not usable.
With the DuckDB instance, you can create one or many Connection instances using the Connection() constructor. While connections
should be thread‑safe, they will be locked during querying. It is therefore recommended that each thread uses its own connection if you
are in a multithreaded environment.
DuckDB db(nullptr);
Connection con(db);
Querying Connections expose the Query() method to send a SQL query string to DuckDB from C++. Query() fully materializes the
query result as a MaterializedQueryResult in memory before returning at which point the query result can be consumed. There is
also a streaming API for queries, see further below.
// create a table
con.Query("CREATE TABLE integers (i INTEGER, j INTEGER)");
The MaterializedQueryResult instance contains firstly two fields that indicate whether the query was successful. Query will not
throw exceptions under normal circumstances. Instead, invalid queries or other issues will lead to the success boolean field in the query
result instance to be set to false. In this case an error message may be available in error as a string. If successful, other fields are set:
the type of statement that was just executed (e.g., StatementType::INSERT_STATEMENT) is contained in statement_type. The
high‑level (”Logical type”/”SQL type”) types of the result set columns are in types. The names of the result columns are in the names
string vector. In case multiple result sets are returned, for example because the result set contained multiple statements, the result set can
be chained using the next field.
DuckDB also supports prepared statements in the C++ API with the Prepare() method. This returns an instance of PreparedState-
ment. This instance can be used to execute the prepared statement with parameters. Below is an example:
207
DuckDB Documentation
Note. Warning Do not use prepared statements to insert large amounts of data into DuckDB. See the data import documentation
for better options.
UDF API The UDF API allows the definition of user‑defined functions. It is exposed in duckdb:Connection through the methods:
CreateScalarFunction(), CreateVectorizedFunction(), and variants. These methods created UDFs into the temporary
schema (TEMP_SCHEMA) of the owner connection that is the only one allowed to use and change them.
CreateScalarFunction The user can code an ordinary scalar function and invoke the CreateScalarFunction() to register and af‑
terward use the UDF in a SELECT statement, for instance:
The CreateScalarFunction() methods automatically creates vectorized scalar UDFs so they are as efficient as built‑in functions, we
have two variants of this method interface as follows:
1.
• template parameters:
This method automatically discovers from the template typenames the corresponding LogicalTypes:
• bool → LogicalType::BOOLEAN
• int8_t → LogicalType::TINYINT
• int16_t → LogicalType::SMALLINT
• int32_t → LogicalType::INTEGER
• int64_t →LogicalType::BIGINT
• float → LogicalType::FLOAT
• double → LogicalType::DOUBLE
• string_t → LogicalType::VARCHAR
In DuckDB some primitive types, e.g., int32_t, are mapped to the same LogicalType: INTEGER, TIME and DATE, then for disam‑
biguation the users can use the following overloaded method.
2.
int32_t udf_date(int32_t a) {
return a;
}
208
DuckDB Documentation
• template parameters:
This function checks the template types against the LogicalTypes passed as arguments and they must match as follow:
• LogicalTypeId::BOOLEAN → bool
• LogicalTypeId::TINYINT → int8_t
• LogicalTypeId::SMALLINT → int16_t
• LogicalTypeId::DATE, LogicalTypeId::TIME, LogicalTypeId::INTEGER → int32_t
• LogicalTypeId::BIGINT, LogicalTypeId::TIMESTAMP → int64_t
• LogicalTypeId::FLOAT, LogicalTypeId::DOUBLE, LogicalTypeId::DECIMAL → double
• LogicalTypeId::VARCHAR, LogicalTypeId::CHAR, LogicalTypeId::BLOB → string_t
• LogicalTypeId::VARBINARY → blob_t
/*
* This vectorized function copies the input values to the result vector
*/
template<typename TYPE>
static void udf_vectorized(DataChunk &args, ExpressionState &state, Vector &result) {
// set the result vector type
result.vector_type = VectorType::FLAT_VECTOR;
// get a raw array from the result
auto result_data = FlatVector::GetData<TYPE>(result);
209
DuckDB Documentation
• args is a DataChunk that holds a set of input vectors for the UDF that all have the same length;
• expr is an ExpressionState that provides information to the query's expression state;
• result: is a Vector to store the result values.
• ConstantVector;
• DictionaryVector;
• FlatVector;
• ListVector;
• StringVector;
• StructVector;
• SequenceVector.
1.
• template parameters:
This method automatically discovers from the template typenames the corresponding LogicalTypes:
• bool → LogicalType::BOOLEAN;
• int8_t → LogicalType::TINYINT;
• int16_t → LogicalType::SMALLINT
• int32_t → LogicalType::INTEGER
• int64_t → LogicalType::BIGINT
• float → LogicalType::FLOAT
• double → LogicalType::DOUBLE
• string_t → LogicalType::VARCHAR
2.
210
DuckDB Documentation
CLI
CLI API
Installation
The DuckDB CLI (Command Line Interface) is a single, dependency‑free executable. It is precompiled for Windows, Mac, and Linux for both
the stable version and for nightly builds produced by GitHub Actions. Please see the installation page under the CLI tab for download
links.
The DuckDB CLI is based on the SQLite command line shell, so CLI‑client‑specific functionality is similar to what is described in the SQLite
documentation (although DuckDB's SQL syntax follows PostgreSQL conventions).
Note. DuckDB has a tldr page that summarizes the most common uses of the CLI client. If you have tldr installed, you can display
it by running tldr duckdb.
Getting Started
Once the CLI executable has been downloaded, unzip it and save it to any directory. Navigate to that directory in a terminal and enter the
command duckdb to run the executable. If in a PowerShell or POSIX shell environment, use the command ./duckdb instead.
Usage
Options The [OPTIONS] part encodes arguments for the CLI client. Common options include:
For a full list of options, see the command line arguments page.
In‑Memory vs. Persistent Database When no [FILENAME] argument is provided, the DuckDB CLI will open a temporary in‑memory
database. You will see DuckDB's version number, the information on the connection and a prompt starting with a D.
$ duckdb
v0.10.0 20b1486d11
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D
To open or create a persistent database, simply include a path as a command line argument like duckdb path/to/my_
database.duckdb or duckdb my_database.db.
Running SQL Statements in the CLI Once the CLI has been opened, enter a SQL statement followed by a semicolon, then hit enter and
it will be executed. Results will be displayed in a table in the terminal. If a semicolon is omitted, hitting enter will allow for multi‑line SQL
statements to be entered.
211
DuckDB Documentation
┌───────────┐
│ my_column │
│ varchar │
├───────────┤
│ quack │
└───────────┘
The CLI supports all of DuckDB's rich SQL syntax including SELECT, CREATE, and ALTER statements.
Editor Features The CLI supports autocompletion, and has sophisticated editor features and syntax highlighting on certain platforms.
Exiting the CLI To exit the CLI, press Ctrl‑D if your platform supports it. Otherwise press Ctrl‑C or use the .exit command. If used
a persistent database, DuckDB will automatically checkpoint (save the latest edits to disk) and close. This will remove the .wal file (the
Write‑Ahead‑Log) and consolidate all of your data into the single‑file database.
Dot Commands In addition to SQL syntax, special dot commands may be entered into the CLI client. To use one of these commands,
begin the line with a period (.) immediately followed by the name of the command you wish to execute. Additional arguments to the
command are entered, space separated, after the command. If an argument must contain a space, either single or double quotes may
be used to wrap that parameter. Dot commands must be entered on a single line, and no whitespace may occur before the period. No
semicolon is required at the end of the line.
Frequently‑used configurations can be stored in the file ~/.duckdbrc, which will be loaded when starting the CLI client. See the Config‑
uring the CLI section below for further information on these options.
Below, we summarize a few important dot commands. To see all available commands, see the dot commands page or use the .help
command.
Opening Database Files In addition to connecting to a database when opening the CLI, a new database connection can be made by using
the .open command. If no additional parameters are supplied, a new in‑memory database connection is created. This database will not
be persisted when the CLI connection is closed.
.open
The .open command optionally accepts several options, but the final parameter can be used to indicate a path to a persistent database
(or where one should be created). The special string :memory: can also be used to open a temporary in‑memory database.
.open persistent.duckdb
One important option accepted by .open is the --readonly flag. This disallows any editing of the database. To open in read only mode,
the database must already exist. This also means that a new in‑memory database can't be opened in read only mode since in‑memory
databases are created upon connection.
Output Formats The .mode dot command may be used to change the appearance of the tables returned in the terminal output. These
include the default duckbox mode, csv and json mode for ingestion by other tools, markdown and latex for documents, and insert
mode for generating SQL statements.
Writing Results to a File By default, the DuckDB CLI sends results to the terminal's standard output. However, this can be modified using
either the .output or .once commands. For details, see the documentation for the output dot command.
212
DuckDB Documentation
Reading SQL from a File The DuckDB CLI can read both SQL commands and dot commands from an external file instead of the terminal
using the .read command. This allows for a number of commands to be run in sequence and allows command sequences to be saved
and reused.
The .read command requires only one argument: the path to the file containing the SQL and/or commands to execute. After running the
commands in the file, control will revert back to the terminal. Output from the execution of that file is governed by the same .output and
.once commands that have been discussed previously. This allows the output to be displayed back to the terminal, as in the first example
below, or out to another file, as in the second example.
In this example, the file select_example.sql is located in the same directory as duckdb.exe and contains the following SQL state‑
ment:
SELECT *
FROM generate_series(5);
.read select_example.sql
The output below is returned to the terminal by default. The formatting of the table can be adjusted using the .output or .once com‑
mands.
| generate_series |
|-----------------|
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
Multiple commands, including both SQL and dot commands, can also be run in a single .read command. In this example, the file write_
markdown_to_file.sql is located in the same directory as duckdb.exe and contains the following commands:
.mode markdown
.output series.md
SELECT *
FROM generate_series(5);
.read write_markdown_to_file.sql
In this case, no output is returned to the terminal. Instead, the file series.md is created (or replaced if it already existed) with the
markdown‑formatted results shown here:
| generate_series |
|-----------------|
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
Several dot commands can be used to configure the CLI. On startup, the CLI reads and executes all commands in the file ~/.duckdbrc,
including dot commands and SQL statements. This allows you to store the configuration state of the CLI. You may also point to a different
initialization file using the -init.
213
DuckDB Documentation
Setting a Custom Prompt As an example, a file in the same directory as the DuckDB CLI named prompt.sql will change the DuckDB
prompt to be a duck head and run a SQL statement. Note that the duck head is built with Unicode characters and does not work in all
terminal environments (e.g., in Windows, unless running with WSL and using the Windows Terminal).
This outputs:
Non‑Interactive Usage
To read/process a file and exit immediately, pipe the file contents in to duckdb:
To execute a command with SQL text passed in directly from the command line, call duckdb with two arguments: the database location
(or :memory:), and a string with the SQL statement to execute.
Loading Extensions
To load extensions, use DuckDB's SQL INSTALL and LOAD commands as you would other SQL statements.
INSTALL fts;
LOAD fts;
When in a Unix environment, it can be useful to pipe data between multiple commands. DuckDB is able to read data from stdin as well
as write to stdout using the file location of stdin (/dev/stdin) and stdout (/dev/stdout) within SQL commands, as pipes act very
similarly to file handles.
First, read a file and pipe it to the duckdb CLI executable. As arguments to the DuckDB CLI, pass in the location of the database to open,
in this case, an in‑memory database, and a SQL command that utilizes /dev/stdin as a file location.
┌───────┐
│ woot │
│ int32 │
├───────┤
│ 42 │
│ 43 │
└───────┘
214
DuckDB Documentation
To write back to stdout, the copy command can be used with the /dev/stdout file location.
$ cat test.csv | duckdb :memory: "COPY (SELECT * FROM read_csv('/dev/stdin')) TO '/dev/stdout' WITH
(FORMAT 'csv', HEADER)"
woot
42
43
Examples To retrieve the home directory's path from the HOME environment variable, use:
┌──────────────────┐
│ home │
│ varchar │
├──────────────────┤
│ /Users/user_name │
└──────────────────┘
The output of the getenv function can be used to set configuration options. For example, to set the NULL order based on the environment
variable DEFAULT_NULL_ORDER, use:
Restrictions for Reading Environment Variables The getenv function can only be run when the enable_external_access is set
to true (the default setting). It is only available in the CLI client and is not supported in other DuckDB clients.
Prepared Statements
The DuckDB CLI supports executing prepared statements in addition to regular SELECT statements. To create and execute a prepared
statement in the CLI client, use the PREPARE clause and the EXECUTE statement.
The table below summarizes DuckDB's command line options. To list all command line options, use the command duckdb -help. Fot
a list of dot commands available in the CLI shell, see the Dot Commands page.
Argument Description
215
DuckDB Documentation
Argument Description
Dot Commands
Dot commands are available in the DuckDB CLI client. To use one of these commands, begin the line with a period (.) immediately followed
by the name of the command you wish to execute. Additional arguments to the command are entered, space separated, after the command.
If an argument must contain a space, either single or double quotes may be used to wrap that parameter. Dot commands must be entered
on a single line, and no whitespace may occur before the period. No semicolon is required at the end of the line. To see available commands,
use the .help command.
Dot Commands
Command Description
216
DuckDB Documentation
Command Description
217
DuckDB Documentation
Command Description
The .help text may be filtered by passing in a text string as the second argument.
.help m
.maxrows COUNT Sets the maximum number of rows for display (default: 40). Only for duckbox mode.
.maxwidth COUNT Sets the maximum width in characters. 0 defaults to terminal width. Only for duckbox
mode.
.mode MODE ?TABLE? Set output mode
.output: Writing Results to a File By default, the DuckDB CLI sends results to the terminal's standard output. However, this can be
modified using either the .output or .once commands. Pass in the desired output file location as a parameter. The .once command will
only output the next set of results and then revert to standard out, but .output will redirect all subsequent output to that file location.
Note that each result will overwrite the entire file at that destination. To revert back to standard output, enter .output with no file
parameter.
In this example, the output format is changed to markdown, the destination is identified as a Markdown file, and then DuckDB will write
the output of the SQL statement to that file. Output is then reverted to standard output using .output with no parameter.
.mode markdown
.output my_results.md
SELECT 'taking flight' AS output_column;
.output
SELECT 'back to the terminal' AS displayed_column;
| output_column |
|---------------|
| taking flight |
| displayed_column |
|----------------------|
| back to the terminal |
A common output format is CSV, or comma separated values. DuckDB supports SQL syntax to export data as CSV or Parquet, but the CLI‑
specific commands may be used to write a CSV instead if desired.
.mode csv
.once my_output_file.csv
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;
col_1,col_2
1,2
10,20
218
DuckDB Documentation
By passing special options (flags) to the .once command, query results can also be sent to a temporary file and automatically opened in
the user's default program. Use either the -e flag for a text file (opened in the default text editor), or the -x flag for a CSV file (opened in
the default spreadsheet editor). This is useful for more detailed inspection of query results, especially if there is a relatively large result set.
The .excel command is equivalent to .once -x.
.once -e
SELECT 'quack' AS hello;
The results then open in the default text file editor of the system, for example:
All DuckDB clients support querying the database schema with SQL, but the CLI has additional dot commands that can make it easier to
understand the contents of a database. The .tables command will return a list of tables in the database. It has an optional argument
that will filter the results according to a LIKE pattern.
For example, to filter to only tables that contain an ”l”, use the LIKE pattern %l%.
.tables %l%
fliers walkers
The .schema command will show all of the SQL statements used to define the schema of the database.
.schema
By default the shell includes support for syntax highlighting. The CLI's syntax highlighter can be configured using the following com‑
mands.
.highlight on
.highlight off
.constant
[red|green|yellow|blue|magenta|cyan|white|brightblack|brightred|brightgreen|brightyellow|brightblue|brightmagenta
.constantcode [terminal_code]
.keyword
[red|green|yellow|blue|magenta|cyan|white|brightblack|brightred|brightgreen|brightyellow|brightblue|brightmagenta
.keywordcode [terminal_code]
219
DuckDB Documentation
Note. Deprecated This feature is only included for compatibility reasons and may be removed in the future. Use the read_csv
function or the COPY statement to load CSV files.
DuckDB supports SQL syntax to directly query or import CSV files, but the CLI‑specific commands may be used to import a CSV instead if
desired. The .import command takes two arguments and also supports several options. The first argument is the path to the CSV file,
and the second is the name of the DuckDB table to create. Since DuckDB requires stricter typing than SQLite (upon which the DuckDB CLI
is based), the destination table must be created before using the .import command. To automatically detect the schema and create a
table from a CSV, see the read_csv examples in the import docs.
In this example, a CSV file is generated by changing to CSV mode and setting an output file location:
.mode csv
.output import_example.csv
SELECT 1 AS col_1, 2 AS col_2 UNION ALL SELECT 10 AS col1, 20 AS col_2;
Now that the CSV has been written, a table can be created with the desired schema and the CSV can be imported. The output is reset to the
terminal to avoid continuing to edit the output file specified above. The --skip N option is used to ignore the first row of data since it is
a header row and the table has already been created with the correct column names.
.mode csv
.output
CREATE TABLE test_table (col_1 INT, col_2 INT);
.import import_example.csv test_table --skip 1
Note that the .import command utilizes the current .mode and .separator settings when identifying the structure of the data to
import. The --csv option can be used to override that behavior.
Output Formats
The .mode dot command may be used to change the appearance of the tables returned in the terminal output. In addition to customizing
the appearance, these modes have additional benefits. This can be useful for presenting DuckDB output elsewhere by redirecting the
terminal output to a file. Using the insert mode will build a series of SQL statements that can be used to insert the data at a later point.
The markdown mode is particularly useful for building documentation and the latex mode is useful for writing academic papers.
Mode Description
220
DuckDB Documentation
Mode Description
.mode markdown
SELECT 'quacking intensifies' AS incoming_ducks;
| incoming_ducks |
|----------------------|
| quacking intensifies |
The output appearance can also be adjusted with the .separator command. If using an export mode that relies on a separator (csv or
tabs for example), the separator will be reset when the mode is changed. For example, .mode csv will set the separator to a comma (,).
Using .separator "|" will then convert the output to be pipe‑separated.
.mode csv
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;
col_1,col_2
1,2
10,20
.separator "|"
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;
col_1|col_2
1|2
10|20
Editing
Note. The linenoise‑based CLI editor is currently only available for macOS and Linux.
DuckDB's CLI uses a line‑editing library based on linenoise, which has short‑cuts that are based on Emacs mode of readline. Below is a list
of available commands.
Moving
Key Action
221
DuckDB Documentation
Key Action
History
Key Action
Changing Text
Key Action
222
DuckDB Documentation
Key Action
Completing
Key Action
Miscellaneous
Key Action
Enter Execute query. If query is not complete, insert a newline at the end of the buffer
Ctrl+J Execute query. If query is not complete, insert a newline at the end of the buffer
Ctrl+C Cancel editing of current query
Ctrl+G Cancel editing of current query
Ctrl+L Clear screen
Ctrl+O Cancel editing of current query
Ctrl+X Insert a newline after the cursor
Ctrl+Z Suspend CLI and return to shell, use fg to re‑open
Using Read‑Line
If you prefer, you can use rlwrap to use read‑line directly with the shell. Then, use Shift+Enter to insert a newline and Enter to execute
the query:
Autocomplete
The shell offers context‑aware autocomplete of SQL queries through the autocomplete extension. autocomplete is triggered by pressing
Tab.
Multiple autocomplete suggestions can be present. You can cycle forwards through the suggestions by repeatedly pressing Tab, or
Shift+Tab to cycle backwards. autocompletion can be reverted by pressing ESC twice.
• Keywords
• Table names and table functions
• Column names and scalar functions
223
DuckDB Documentation
• File names
The shell looks at the position in the SQL statement to determine which of these autocompletions to trigger. For example:
Syntax Highlighting
Note. Syntax highlighting in the CLI is currently only available for macOS and Linux.
SQL queries that are written in the shell are automatically highlighted using syntax highlighting.
There are several components of a query that are highlighted in different colors. The colors can be configured using dot commands. Syntax
highlighting can also be disabled entirely using the .highlight off command.
224
DuckDB Documentation
The components can be configured using either a supported color name (e.g., .keyword red), or by directly providing a terminal code to
use for rendering (e.g., .keywordcode \033[31m). Below is a list of supported color names and their corresponding terminal codes.
red \033[31m
green \033[32m
yellow \033[33m
blue \033[34m
magenta \033[35m
cyan \033[36m
white \033[37m
brightblack \033[90m
brightred \033[91m
brightgreen \033[92m
brightyellow \033[93m
brightblue \033[94m
brightmagenta \033[95m
brightcyan \033[96m
brightwhite \033[97m
.keyword brightred
.constant brightwhite
.comment cyan
.error yellow
.cont blue
.cont_sel brightblue
If you wish to start up the CLI with a different set of colors every time, you can place these commands in the ~/.duckdbrc file that is
loaded on start‑up of the CLI.
Error Highlighting
The shell has support for highlighting certain errors. In particular, mismatched brackets and unclosed quotes are highlighted in red (or
another color if specified). This highlighting is automatically disabled for large queries. In addition, it can be disabled manually using the
.render_errors off command.
Go
The DuckDB Go driver, go-duckdb, allows using DuckDB via the database/sql interface. For examples on how to use this interface,
see the official documentation and tutorial.
225
DuckDB Documentation
Installation
go get github.com/marcboeker/go-duckdb
Importing
To import the DuckDB Go package, add the following entries to your imports:
import (
"database/sql"
_ "github.com/marcboeker/go-duckdb"
)
Appender
The DuckDB Go client supports the DuckDB Appender API for bulk inserts. You can obtain a new Appender by supplying a DuckDB connec‑
tion to NewAppenderFromConn(). For example:
// Retrieve appender from connection (note that you have to create the table 'test' beforehand).
appender, err := NewAppenderFromConn(conn, "", "test")
if err != nil {
...
}
defer appender.Close()
err = appender.AppendRow(...)
if err != nil {
...
}
Examples
226
DuckDB Documentation
package main
import (
"database/sql"
"errors"
"fmt"
"log"
_ "github.com/marcboeker/go-duckdb"
)
func main() {
db, err := sql.Open("duckdb", "")
if err != nil {
log.Fatal(err)
}
defer db.Close()
var (
id int
name string
)
row := db.QueryRow(` SELECT id, name FROM people`)
err = row.Scan(&id, &name)
if errors.Is(err, sql.ErrNoRows) {
log.Println("no rows")
} else if err != nil {
log.Fatal(err)
}
More Examples For more examples, see the examples in the duckdb-go repository.
Installation
The DuckDB Java JDBC API can be installed from Maven Central. Please see the installation page for details.
DuckDB's JDBC API implements the main parts of the standard Java Database Connectivity (JDBC) API, version 4.1. Describing JDBC is
beyond the scope of this page, see the official documentation for details. Below we focus on the DuckDB‑specific parts.
227
DuckDB Documentation
Refer to the externally hosted API Reference for more information about our extensions to the JDBC specification, or the below Arrow
Methods.
Startup & Shutdown In JDBC, database connections are created through the standard java.sql.DriverManager class. The driver
should auto‑register in the DriverManager, if that does not work for some reason, you can enforce registration like so:
Class.forName("org.duckdb.DuckDBDriver");
To create a DuckDB connection, call DriverManager with the jdbc:duckdb: JDBC URL prefix, like so:
import java.sql.Connection;
import java.sql.DriverManager;
To use DuckDB‑specific features such as the Appender, cast the object to a DuckDBConnection:
import org.duckdb.DuckDBConnection;
When using the jdbc:duckdb: URL alone, an in‑memory database is created. Note that for an in‑memory database no data is persisted
to disk (i.e., all data is lost when you exit the Java program). If you would like to access or create a persistent database, append its file name
after the path. For example, if your database is stored in /tmp/my_database, use the JDBC URL jdbc:duckdb:/tmp/my_database
to create a connection to it.
It is possible to open a DuckDB database file in read‑only mode. This is for example useful if multiple Java processes want to read the same
database file at the same time. To open an existing database file in read‑only mode, set the connection property duckdb.read_only
like so:
Additional connections can be created using the DriverManager. A more efficient mechanism is to call the DuckDBConnec-
tion#duplicate() method like so:
Multiple connections are allowed, but mixing read‑write and read‑only connections is unsupported.
Configuring Connections Configuration options can be provided to change different settings of the database system. Note that many
of these settings can be changed later on using PRAGMA statements as well.
Querying DuckDB supports the standard JDBC methods to send queries and retrieve result sets. First a Statement object has to be
created from the Connection, this object can then be used to send queries using execute and executeQuery. execute() is meant
for queries where no results are expected like CREATE TABLE or UPDATE etc. and executeQuery() is meant to be used for queries
that produce results (e.g., SELECT). Below two examples. See also the JDBC Statement and ResultSet documentations.
// create a table
Statement stmt = conn.createStatement();
stmt.execute("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)");
// insert two items into the table
stmt.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)");
stmt.close();
228
DuckDB Documentation
Note. Warning Do not use prepared statements to insert large amounts of data into DuckDB. See the data import documentation
for better options.
Arrow Export The following demonstrates exporting an arrow stream and consuming it using the java arrow bindings
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.duckdb.DuckDBResultSet;
Arrow Import The following demonstrates consuming an arrow stream from the java arrow bindings
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.duckdb.DuckDBConnection;
// Arrow stuff
try (var allocator = new RootAllocator();
ArrowStreamReader reader = null; // should not be null of course
var arrow_array_stream = ArrowArrayStream.allocateNew(allocator)) {
Data.exportArrayStream(allocator, reader, arrow_array_stream);
// DuckDB stuff
229
DuckDB Documentation
// run a query
try (var stmt = conn.createStatement();
var rs = (DuckDBResultSet) stmt.executeQuery("SELECT count(*) FROM asdf")) {
while (rs.next()) {
System.out.println(rs.getInt(1));
}
}
}
}
Streaming Results Result streaming is opt‑in in the JDBC driver ‑ by setting the jdbc_stream_results config to true before run‑
ning a query. The easiest way do that is to pass it in the Properties object.
Appender The Appender is available in the DuckDB JDBC driver via the org.duckdb.DuckDBAppender class. The constructor of the
class requires the schema name and the table name it is applied to. The Appender is flushed when the close() method is called.
Example:
import org.duckdb.DuckDBConnection;
// using try-with-resources to automatically close the appender at the end of the scope
try (var appender = conn.createAppender(DuckDBConnection.DEFAULT_SCHEMA, "tbl")) {
appender.beginRow();
appender.append(10);
appender.append(3.2);
appender.append("hello");
appender.endRow();
appender.beginRow();
appender.append(20);
appender.append(-8.1);
appender.append("world");
appender.endRow();
}
stmt.close();
Batch Writer The DuckDB JDBC driver offers batch write functionality. The batch writer supports prepared statements to mitigate the
overhead of query parsing.
Note. The preferred method for bulk inserts is to use the Appender due to its higher performance. However, when using the Ap‑
pender is not possbile, the batch writer is available as alternative.
230
DuckDB Documentation
stmt.setObject(1, 1);
stmt.setObject(2, 2);
stmt.setObject(3, 3);
stmt.addBatch();
stmt.setObject(1, 4);
stmt.setObject(2, 5);
stmt.setObject(3, 6);
stmt.addBatch();
stmt.executeBatch();
stmt.close();
Batch Writer with Vanilla Statements The batch writer also supports vanilla SQL statements:
import org.duckdb.DuckDBConnection;
stmt.executeBatch();
stmt.close();
Julia Package
The DuckDB Julia package provides a high‑performance front‑end for DuckDB. Much like SQLite, DuckDB runs in‑process within the Julia
client, and provides a DBInterface front‑end.
The package also supports multi‑threaded execution. It uses Julia threads/tasks for this purpose. If you wish to run queries in parallel, you
must launch Julia with multi‑threading support (by e.g., setting the JULIA_NUM_THREADS environment variable).
Installation
using Pkg
Pkg.add("DuckDB")
Alternatively, enter the package manager using the ] key, and issue the following command:
Basics
using DuckDB
231
DuckDB Documentation
# create a table
DBInterface.execute(con, "CREATE TABLE integers (i INTEGER)")
Scanning DataFrames
The DuckDB Julia package also provides support for querying Julia DataFrames. Note that the DataFrames are directly read by DuckDB ‑
they are not inserted or copied into the database itself.
If you wish to load data from a DataFrame into a DuckDB table you can run a CREATE TABLE ... AS or INSERT INTO query.
using DuckDB
using DataFrames
# create a DataFrame
df = DataFrame(a = [1, 2, 3], b = [42, 84, 42])
Appender API
The DuckDB Julia package also supports the Appender api, which is much faster than using prepared statements or individual INSERT
INTO statements. Appends are made in row‑wise format. For every column, an append() call should be made, after which the row should
be finished by calling flush(). After all rows have been appended, close() should be used to finalize the appender and clean up the resulting
memory.
232
DuckDB Documentation
for j in i
DuckDB.append(appender, j)
end
DuckDB.end_row(appender)
end
# flush the appender after all rows
DuckDB.flush(appender)
DuckDB.close(appender)
Concurrency
Within a julia process, tasks are able to concurrently read and write to the database, as long as each task maintains its own connection to
the database. In the example below, a single task is spawned to periodically read the database and many tasks are spawned to write to the
database using both INSERT statements as well as the appender api.
function run_reader(db)
# create a DuckDB connection specifically for this task
conn = DBInterface.connect(db)
while true
println(DBInterface.execute(conn,
"SELECT id, count(date) as count, max(date) as max_date
FROM data group by id order by id") |> DataFrames.DataFrame)
Threads.sleep(1)
end
DBInterface.close(conn)
end
# spawn one reader task
Threads.@spawn run_reader(db)
233
DuckDB Documentation
DuckDB.flush(appender);
end
DuckDB.close(appender);
end
# spawn many appender tasks
for i in 1:100
Threads.@spawn run_appender(db, 2)
end
Node.js
Node.js API
This package provides a Node.js API for DuckDB. The API for this client is somewhat compliant to the SQLite Node.js client for easier tran‑
sition.
Initializing
All options as described on Database configuration can be (optionally) supplied to the Database constructor as second argument. The
third argument can be optionally supplied to get feedback on the given options.
Running a Query
The following code snippet runs a simple query using the Database.all() method.
234
DuckDB Documentation
Other available methods are each, where the callback is invoked for each row, run to execute a single statement without results and
exec, which can execute several SQL commands at once but also does not return results. All those commands can work with prepared
statements, taking the values for the parameters as additional arguments. For example like so:
db.all('SELECT ?::INTEGER AS fortytwo, ?::STRING AS hello', 42, 'Hello, World', function(err, res) {
if (err) {
console.warn(err);
return;
}
console.log(res[0].fortytwo)
console.log(res[0].hello)
});
Connections
A database can have multiple Connections, those are created using db.connect().
You can create multiple connections, each with their own transaction context.
Connection objects also contain shorthands to directly call run(), all() and each() with parameters and callbacks, respectively,
for example:
Prepared Statements
From connections, you can create prepared statements (and only that) using con.prepare():
To execute this statement, you can call for example all() on the stmt object:
You can also execute the prepared statement multiple times. This is for example useful to fill a table with data:
235
DuckDB Documentation
console.log(res)
}
});
prepare() can also take a callback which gets the prepared statement as an argument:
Apache Arrow can be used to insert data into DuckDB without making a copy:
const jsonData = [
{"userId":1,"id":1,"title":"delectus aut autem","completed":false},
{"userId":1,"id":2,"title":"quis ut nam facilis et officia qui","completed":false}
];
236
DuckDB Documentation
Node.js API
Modules
Typedefs
duckdb
• duckdb
– ~Connection
* .sql ⇒
* .get()
* .run(sql, ...params, callback) ⇒ void
* .all(sql, ...params, callback) ⇒ void
* .arrowIPCAll(sql, ...params, callback) ⇒ void
* .each(sql, ...params, callback) ⇒ void
* .finalize(sql, ...params, callback) ⇒ void
* .stream(sql, ...params)
* .columns() ⇒ Array.<ColumnInfo>
– ~QueryResult
* .nextChunk() ⇒
* .nextIpcBuffer() ⇒
* .asyncIterator()
– ~Database
* .close(callback) ⇒ void
* .close_internal(callback) ⇒ void
* .wait(callback) ⇒ void
* .serialize(callback) ⇒ void
* .parallelize(callback) ⇒ void
* .connect(path) ⇒ Connection
* .interrupt(callback) ⇒ void
* .prepare(sql) ⇒ Statement
* .run(sql, ...params, callback) ⇒ void
* .scanArrowIpc(sql, ...params, callback) ⇒ void
* .each(sql, ...params, callback) ⇒ void
* .all(sql, ...params, callback) ⇒ void
237
DuckDB Documentation
• ~Connection
connection.run(sql, ...params, callback) ⇒ void Run a SQL statement and trigger a callback when done
Param Type
sql
...params *
callback
connection.all(sql, ...params, callback) ⇒ void Run a SQL query and triggers the callback once for all result rows
238
DuckDB Documentation
Param Type
sql
...params *
callback
connection.arrowIPCAll(sql, ...params, callback) ⇒ void Run a SQL query and serialize the result into the Apache Arrow IPC format
(requires arrow extension to be loaded)
Param Type
sql
...params *
callback
connection.arrowIPCStream(sql, ...params, callback) ⇒ Run a SQL query, returns a IpcResultStreamIterator that allows streaming the
result into the Apache Arrow IPC format (requires arrow extension to be loaded)
Param Type
sql
...params *
callback
connection.each(sql, ...params, callback) ⇒ void Runs a SQL query and triggers the callback for each result row
Param Type
sql
...params *
callback
Param Type
sql
...params *
239
DuckDB Documentation
Param
name
return_type
fun
Param Type
sql
...params *
callback
Param Type
sql
...params *
callback
Param
name
return_type
callback
Param
name
return_type
240
DuckDB Documentation
Param
callback
connection.register_buffer(name, array, force, callback) ⇒ void Register a Buffer to be scanned using the Apache Arrow IPC scanner
(requires arrow extension to be loaded)
Param
name
array
force
callback
Param
name
callback
Param
callback
• ~Statement
– .sql ⇒
– .get()
– .run(sql, ...params, callback) ⇒ void
– .all(sql, ...params, callback) ⇒ void
– .arrowIPCAll(sql, ...params, callback) ⇒ void
– .each(sql, ...params, callback) ⇒ void
– .finalize(sql, ...params, callback) ⇒ void
– .stream(sql, ...params)
– .columns() ⇒ Array.<ColumnInfo>
241
DuckDB Documentation
Param Type
sql
...params *
callback
Param Type
sql
...params *
callback
Param Type
sql
...params *
callback
Param Type
sql
...params *
callback
Param Type
sql
...params *
callback
242
DuckDB Documentation
Param Type
sql
...params *
• ~QueryResult
– .nextChunk() ⇒
– .nextIpcBuffer() ⇒
– .asyncIterator()
queryResult.nextIpcBuffer() ⇒ Function to fetch the next result blob of an Arrow IPC Stream in a zero‑copy way. (requires arrow exten‑
sion to be loaded)
Param Description
• ~Database
– .close(callback) ⇒ void
– .close_internal(callback) ⇒ void
– .wait(callback) ⇒ void
– .serialize(callback) ⇒ void
– .parallelize(callback) ⇒ void
243
DuckDB Documentation
– .connect(path) ⇒ Connection
– .interrupt(callback) ⇒ void
– .prepare(sql) ⇒ Statement
– .run(sql, ...params, callback) ⇒ void
– .scanArrowIpc(sql, ...params, callback) ⇒ void
– .each(sql, ...params, callback) ⇒ void
– .all(sql, ...params, callback) ⇒ void
– .arrowIPCAll(sql, ...params, callback) ⇒ void
– .arrowIPCStream(sql, ...params, callback) ⇒ void
– .exec(sql, ...params, callback) ⇒ void
– .register_udf(name, return_type, fun) ⇒ this
– .register_buffer(name) ⇒ this
– .unregister_buffer(name) ⇒ this
– .unregister_udf(name) ⇒ this
– .registerReplacementScan(fun) ⇒ this
– .tokenize(text) ⇒ ScriptTokens
– .get()
Param
callback
Param
callback
database.wait(callback) ⇒ void Triggers callback when all scheduled database tasks have completed.
Param
callback
Param
callback
244
DuckDB Documentation
Param
callback
Param Description
database.interrupt(callback) ⇒ void Supposedly interrupt queries, but currently does not do anything.
Param
callback
Param
sql
database.run(sql, ...params, callback) ⇒ void Convenience method for Connection#run using a built‑in default connection
Param Type
sql
...params *
callback
database.scanArrowIpc(sql, ...params, callback) ⇒ void Convenience method for Connection#scanArrowIpc using a built‑in default
connection
Param Type
sql
...params *
245
DuckDB Documentation
Param Type
callback
Param Type
sql
...params *
callback
database.all(sql, ...params, callback) ⇒ void Convenience method for Connection#apply using a built‑in default connection
Param Type
sql
...params *
callback
database.arrowIPCAll(sql, ...params, callback) ⇒ void Convenience method for Connection#arrowIPCAll using a built‑in default con‑
nection
Param Type
sql
...params *
callback
database.arrowIPCStream(sql, ...params, callback) ⇒ void Convenience method for Connection#arrowIPCStream using a built‑in de‑
fault connection
Param Type
sql
...params *
callback
246
DuckDB Documentation
Param Type
sql
...params *
callback
Param
name
return_type
fun
database.register_buffer(name) ⇒ this Register a buffer containing serialized data to be scanned from DuckDB.
Param
name
Param
name
Param
name
247
DuckDB Documentation
Param Description
Param
text
duckdb~ERROR : number Check that errno attribute equals this to check for a duckdb error
248
DuckDB Documentation
ColumnInfo : object
TypeInfo : object
id string Type ID
[alias] string SQL type alias
sql_type string SQL type name
DuckDbError : object
HTTPError : object
249
DuckDB Documentation
Python
Python API
Installation
The DuckDB Python API can be installed using pip: pip install duckdb. Please see the installation page for details. It is also possible
to install DuckDB using conda: conda install python-duckdb -c conda-forge.
The most straight‑forward manner of running SQL queries using DuckDB is using the duckdb.sql command.
import duckdb
duckdb.sql("SELECT 42").show()
This will run queries using an in‑memory database that is stored globally inside the Python module. The result of the query is returned
as a Relation. A relation is a symbolic representation of the query. The query is not executed until the result is fetched or requested to be
printed to the screen.
Relations can be referenced in subsequent queries by storing them inside variables, and using them as tables. This way queries can be
constructed incrementally.
import duckdb
r1 = duckdb.sql("SELECT 42 AS i")
duckdb.sql("SELECT i * 2 AS k FROM r1").show()
Data Input
DuckDB can ingest data from a wide variety of formats – both on‑disk and in‑memory. See the data ingestion page for more information.
import duckdb
duckdb.read_csv("example.csv") # read a CSV file into a Relation
duckdb.read_parquet("example.parquet") # read a Parquet file into a Relation
duckdb.read_json("example.json") # read a JSON file into a Relation
DataFrames DuckDB can also directly query Pandas DataFrames, Polars DataFrames and Arrow tables.
250
DuckDB Documentation
import duckdb
Result Conversion
DuckDB supports converting query results efficiently to a variety of formats. See the result conversion page for more information.
import duckdb
duckdb.sql("SELECT 42").fetchall() # Python objects
duckdb.sql("SELECT 42").df() # Pandas DataFrame
duckdb.sql("SELECT 42").pl() # Polars DataFrame
duckdb.sql("SELECT 42").arrow() # Arrow Table
duckdb.sql("SELECT 42").fetchnumpy() # NumPy Arrays
DuckDB supports writing Relation objects directly to disk in a variety of formats. The COPY statement can be used to write data to disk
using SQL as an alternative.
import duckdb
duckdb.sql("SELECT 42").write_parquet("out.parquet") # Write to a Parquet file
duckdb.sql("SELECT 42").write_csv("out.csv") # Write to a CSV file
duckdb.sql("COPY (SELECT 42) TO 'out.parquet'") # Copy to a Parquet file
When using DuckDB through duckdb.sql(), it operates on an in‑memory database, i.e., no tables are persisted on disk. Invoking the
duckdb.connect() method without arguments returns a connection, which also uses an in‑memory database:
import duckdb
con = duckdb.connect()
con.sql("SELECT 42 AS x").show()
Persistent Storage
The duckdb.connect( dbname) creates a connection to a persistent database. Any data written to that connection will be persisted,
and can be reloaded by re‑connecting to the same file, both from Python and from other DuckDB clients.
import duckdb
251
DuckDB Documentation
con = duckdb.connect("file.db")
# create a table and load data into it
con.sql("CREATE TABLE test (i INTEGER)")
con.sql("INSERT INTO test VALUES (42)")
# query the table
con.table("test").show()
# explicitly close the connection
con.close()
# Note: connections also closed implicitly when they go out of scope
You can also use a context manager to ensure that the connection is closed:
import duckdb
The connection object and the duckdb module can be used interchangeably – they support the same methods. The only difference is that
when using the duckdb module a global in‑memory database is used.
Note that if you are developing a package designed for others to use, and use DuckDB in the package, it is recommend that you create con‑
nection objects instead of using the methods on the duckdb module. That is because the duckdb module uses a shared global database
– which can cause hard to debug issues if used from within multiple different packages.
The DuckDBPyConnection object is not thread‑safe. If you would like to write to the same database from multiple threads, create a
cursor for each thread with the DuckDBPyConnection.cursor() method.
DuckDB's Python API provides functions for installing and loading extensions, which perform the equivalent operations to running the
INSTALL and LOAD SQL commands, respectively. An example that installs and loads the spatial extension looks like follows:
import duckdb
con = duckdb.connect()
con.install_extension("spatial")
con.load_extension("spatial")
Note. To load unsigned extensions, add the config = {"allow_unsigned_extensions": "true"} argument to the
duckdb.connect() method.
Data Ingestion
CSV Files
CSV files can be read using the read_csv function, called either from within Python or directly from within SQL. By default, the read_
csv function attempts to auto‑detect the CSV settings by sampling from the provided file.
252
DuckDB Documentation
import duckdb
# read from a file using fully auto-detected settings
duckdb.read_csv("example.csv")
# read multiple CSV files from a folder
duckdb.read_csv("folder/*.csv")
# specify options on how the CSV is formatted internally
duckdb.read_csv("example.csv", header = False, sep = ",")
# override types of the first two columns
duckdb.read_csv("example.csv", dtype = ["int", "varchar"])
# use the (experimental) parallel CSV reader
duckdb.read_csv("example.csv", parallel = True)
# directly read a CSV file from within SQL
duckdb.sql("SELECT * FROM 'example.csv'")
# call read_csv from within SQL
duckdb.sql("SELECT * FROM read_csv('example.csv')")
Parquet Files
Parquet files can be read using the read_parquet function, called either from within Python or directly from within SQL.
import duckdb
# read from a single Parquet file
duckdb.read_parquet("example.parquet")
# read multiple Parquet files from a folder
duckdb.read_parquet("folder/*.parquet")
# read a Parquet over https
duckdb.read_parquet("https://some.url/some_file.parquet")
# read a list of Parquet files
duckdb.read_parquet(["file1.parquet", "file2.parquet", "file3.parquet"])
# directly read a Parquet file from within SQL
duckdb.sql("SELECT * FROM 'example.parquet'")
# call read_parquet from within SQL
duckdb.sql("SELECT * FROM read_parquet('example.parquet')")
JSON Files
JSON files can be read using the read_json function, called either from within Python or directly from within SQL. By default, the read_
json function will automatically detect if a file contains newline‑delimited JSON or regular JSON, and will detect the schema of the objects
stored within the JSON file.
import duckdb
# read from a single JSON file
duckdb.read_json("example.json")
# read multiple JSON files from a folder
duckdb.read_json("folder/*.json")
# directly read a JSON file from within SQL
duckdb.sql("SELECT * FROM 'example.json'")
# call read_json from within SQL
duckdb.sql("SELECT * FROM read_json_auto('example.json')")
DuckDB is automatically able to query a Pandas DataFrame, Polars DataFrame, or Arrow object that is stored in a Python variable by name.
Accessing these is made possible by replacement scans.
253
DuckDB Documentation
DuckDB supports querying multiple types of Apache Arrow objects including tables, datasets, RecordBatchReaders, and scanners. See the
Python guides for more examples.
import duckdb
import pandas as pd
test_df = pd.DataFrame.from_dict({"i": [1, 2, 3, 4], "j": ["one", "two", "three", "four"]})
duckdb.sql("SELECT * FROM test_df").fetchall()
# [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
DuckDB also supports ”registering” a DataFrame or Arrow object as a virtual table, comparable to a SQL VIEW. This is useful when querying
a DataFrame/Arrow object that is stored in another way (as a class variable, or a value in a dictionary). Below is a Pandas example:
If your Pandas DataFrame is stored in another location, here is an example of manually registering it:
import duckdb
import pandas as pd
my_dictionary = {}
my_dictionary["test_df"] = pd.DataFrame.from_dict({"i": [1, 2, 3, 4], "j": ["one", "two", "three",
"four"]})
duckdb.register("test_df_view", my_dictionary["test_df"])
duckdb.sql("SELECT * FROM test_df_view").fetchall()
# [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
You can also create a persistent table in DuckDB from the contents of the DataFrame (or the view):
Pandas DataFrames – object Columns pandas.DataFrame columns of an object dtype require some special care, since this
stores values of arbitrary type. To convert these columns to DuckDB, we first go through an analyze phase before converting the values. In
this analyze phase a sample of all the rows of the column are analyzed to determine the target type. This sample size is by default set to
1000. If the type picked during the analyze step is incorrect, this will result in a ”Failed to cast value:” error, in which case you will need to
increase the sample size. The sample size can be changed by setting the pandas_analyze_sample config option.
Object Conversion
int Since integers can be of arbitrary size in Python, there is not a one‑to‑one conversion possible for ints. Intead we perform these casts
in order until one succeeds:
• BIGINT
254
DuckDB Documentation
• INTEGER
• UBIGINT
• UINTEGER
• DOUBLE
When using the DuckDB Value class, it's possible to set a target type, which will influence the conversion.
• DOUBLE
• FLOAT
datetime.datetime For datetime we will check pandas.isnull if it's available and return NULL if it returns true. We check
against datetime.datetime.min and datetime.datetime.max to convert to -inf and +inf respectively.
If the datetime has tzinfo, we will use TIMESTAMPTZ, otherwise it becomes TIMESTAMP.
datetime.time If the time has tzinfo, we will use TIMETZ, otherwise it becomes TIME.
datetime.date date converts to the DATE type. We check against datetime.date.min and datetime.date.max to convert
to -inf and +inf respectively.
bytes bytes converts to BLOB by default, when it's used to construct a Value object of type BITSTRING, it maps to BITSTRING
instead.
list list becomes a LIST type of the ”most permissive” type of its children, for example:
my_list_value = [
12345,
"test"
]
Will become VARCHAR[] because 12345 can convert to VARCHAR but test can not convert to INTEGER.
[12345, test]
dict The dict object can convert to either STRUCT(...) or MAP(..., ...) depending on its structure. If the dict has a structure
similar to:
my_map_dict = {
"key": [
1, 2, 3
],
"value": [
"one", "two", "three"
]
}
Then we'll convert it to a MAP of key‑value pairs of the two lists zipped together. The example above becomes a MAP(INTEGER, VAR-
CHAR):
255
DuckDB Documentation
Note. The names of the fields matter and the two lists need to have the same size.
my_struct_dict = {
1: "one",
"2": 2,
"three": [1, 2, 3],
False: True
}
Becomes:
tuple tuple converts to LIST by default, when it's used to construct a Value object of type STRUCT it will convert to STRUCT in‑
stead.
numpy.ndarray and numpy.datetime64 ndarray and datetime64 are converted by calling tolist() and converting the
result of that.
Result Conversion
DuckDB's Python client provides multiple additional methods that can be used to efficiently retrieve data.
NumPy
Pandas
Apache Arrow
Polars
Below are some examples using this functionality. See the Python guides for more examples.
256
DuckDB Documentation
# fetch as an Arrow table. Converting to Pandas afterwards just for pretty printing
tbl = con.execute("SELECT * FROM items").fetch_arrow_table()
print(tbl.to_pandas())
# item value count
# 0 jeans 20.00 1
# 1 hammer 42.20 2
# 2 laptop 2000.00 1
# 3 chainsaw 500.00 10
# 4 iphone 300.00 2
Python DB API
The standard DuckDB Python API provides a SQL interface compliant with the DB‑API 2.0 specification described by PEP 249 similar to the
SQLite Python API.
Connection
To use the module, you must first create a DuckDBPyConnection object that represents the database. The connection object takes as
a parameter the database file to read and write from. If the database file does not exist, it will be created (the file extension may be .db,
.duckdb, or anything else). The special value :memory: (the default) can be used to create an in‑memory database. Note that for an
in‑memory database no data is persisted to disk (i.e., all data is lost when you exit the Python process). If you would like to connect to an
existing database in read‑only mode, you can set the read_only flag to True. Read‑only mode is required if multiple Python processes
want to access the same database file at the same time.
By default we create an in‑memory‑database that lives inside the duckdb module. Every method of DuckDBPyConnection is also
available on the duckdb module, this connection is what's used by these methods. You can also get a reference to this connection by
providing the special value :default: to connect.
import duckdb
257
DuckDB Documentation
┌───────┐
│ a │
│ int32 │
├───────┤
│ 42 │
└───────┘
import duckdb
# to start an in-memory database
con = duckdb.connect(database = ":memory:")
# to use a database file (not shared between processes)
con = duckdb.connect(database = "my-db.duckdb", read_only = False)
# to use a database file (shared between processes)
con = duckdb.connect(database = "my-db.duckdb", read_only = True)
# to explicitly get the default connection
con = duckdb.connect(database = ":default:")
If you want to create a second connection to an existing database, you can use the cursor() method. This might be useful for example
to allow parallel threads running queries independently. A single connection is thread‑safe but is locked for the duration of the queries,
effectively serializing database access in this case.
Connections are closed implicitly when they go out of scope or if they are explicitly closed using close(). Once the last connection to a
database instance is closed, the database instance is closed as well.
Querying
SQL queries can be sent to DuckDB using the execute() method of connections. Once a query has been executed, results can be re‑
trieved using the fetchone and fetchall methods on the connection. fetchall will retrieve all results and complete the transaction.
fetchone will retrieve a single row of results each time that it is invoked until no more results are available. The transaction will only close
once fetchone is called and there are no more results remaining (the return value will be None). As an example, in the case of a query
only returning a single row, fetchone should be called once to retrieve the results and a second time to close the transaction. Below are
some short examples:
# create a table
con.execute("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)")
# insert two items into the table
con.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")
The description property of the connection object contains the column names as per the standard.
Prepared Statements DuckDB also supports prepared statements in the API with the execute and executemany methods. The val‑
ues may be passed as an additional parameter after a query that contains ? or $1 (dollar symbol and a number) placeholders. Using the ?
notation adds the values in the same sequence as passed within the Python parameter. Using the $ notation allows for values to be reused
within the SQL statement based on the number and index of the value found within the Python parameter.
258
DuckDB Documentation
Note. Warning Do not use executemany to insert large amounts of data into DuckDB. See the data ingestion page for better
options.
Named Parameters
Besides the standard unnamed parameters, like $1, $2 etc, it's also possible to supply named parameters, like $my_parameter. When
using named parameters, you have to provide a dictionary mapping of str to value in the parameters argument
An example use:
import duckdb
res = duckdb.execute("""
SELECT
$my_param,
$other_param,
$also_param
""",
{
"my_param": 5,
"other_param": "DuckDB",
"also_param": [42]
}
).fetchall()
print(res)
# [(5, 'DuckDB', [42])]
Relational API
The Relational API is an alternative API that can be used to incrementally construct queries. The API is centered around DuckDBPyRela-
tion nodes. The relations can be seen as symbolic representations of SQL queries. They do not hold any data ‑ and nothing is executed ‑
until a method that triggers execution is called.
Constructing Relations
Relations can be created from SQL queries using the duckdb.sql method. Alternatively, they can be created from the various data inges‑
tion methods (read_parquet, read_csv, read_json).
259
DuckDB Documentation
import duckdb
rel = duckdb.sql("SELECT * FROM range(10_000_000_000) tbl(id)")
rel.show()
┌────────────────────────┐
│ id │
│ int64 │
├────────────────────────┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
│ · │
│ · │
│ · │
│ 9990 │
│ 9991 │
│ 9992 │
│ 9993 │
│ 9994 │
│ 9995 │
│ 9996 │
│ 9997 │
│ 9998 │
│ 9999 │
├────────────────────────┤
│ ? rows │
│ (>9999 rows, 20 shown) │
└────────────────────────┘
Note how we are constructing a relation that computes an immense amount of data (10B rows, or 74GB of data). The relation is constructed
instantly ‑ and we can even print the relation instantly.
When printing a relation using show or displaying it in the terminal, the first 10K rows are fetched. If there are more than 10K rows, the
output window will show >9999 rows (as the amount of rows in the relation is unknown).
Data Ingestion
Outside of SQL queries, the following methods are provided to construct relation objects from external data.
• from_arrow
• from_df
• read_csv
• read_json
• read_parquet
SQL Queries
Relation objects can be queried through SQL through so‑called replacement scans. If you have a relation object stored in a variable, you
can refer to that variable as if it was a SQL table (in the FROM clause). This allows you to incrementally build queries using relation objects.
260
DuckDB Documentation
import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
duckdb.sql("SELECT sum(id) FROM rel").show()
┌──────────────┐
│ sum(id) │
│ int128 │
├──────────────┤
│ 499999500000 │
└──────────────┘
Operations
There are a number of operations that can be performed on relations. These are all short‑hand for running the SQL queries ‑ and will return
relations again themselves.
aggregate(expr, groups = {}) Apply an (optionally grouped) aggregate over the relation. The system will automatically group
by any columns that are not aggregates.
import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.aggregate("id % 2 AS g, sum(id), min(id), max(id)")
┌───────┬──────────────┬─────────┬─────────┐
│ g │ sum(id) │ min(id) │ max(id) │
│ int64 │ int128 │ int64 │ int64 │
├───────┼──────────────┼─────────┼─────────┤
│ 0 │ 249999500000 │ 0 │ 999998 │
│ 1 │ 250000000000 │ 1 │ 999999 │
└───────┴──────────────┴─────────┴─────────┘
except_(rel) Select all rows in the first relation, that do not occur in the second relation. The relations must have the same number
of columns.
import duckdb
r1 = duckdb.sql("SELECT * FROM range(10) tbl(id)")
r2 = duckdb.sql("SELECT * FROM range(5) tbl(id)")
r1.except_(r2).show()
┌───────┐
│ id │
│ int64 │
├───────┤
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
└───────┘
filter(condition) Apply the given condition to the relation, filtering any rows that do not satisfy the condition.
import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.filter("id > 5").limit(3).show()
261
DuckDB Documentation
┌───────┐
│ id │
│ int64 │
├───────┤
│ 6 │
│ 7 │
│ 8 │
└───────┘
intersect(rel) Select the intersection of two relations ‑ returning all rows that occur in both relations. The relations must have the
same number of columns.
import duckdb
r1 = duckdb.sql("SELECT * FROM range(10) tbl(id)")
r2 = duckdb.sql("SELECT * FROM range(5) tbl(id)")
r1.intersect(r2).show()
┌───────┐
│ id │
│ int64 │
├───────┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
└───────┘
join(rel, condition, type = "inner") Combine two relations, joining them based on the provided condition.
import duckdb
r1 = duckdb.sql("SELECT * FROM range(5) tbl(id)").set_alias("r1")
r2 = duckdb.sql("SELECT * FROM range(10, 15) tbl(id)").set_alias("r2")
r1.join(r2, "r1.id + 10 = r2.id").show()
┌───────┬───────┐
│ id │ id │
│ int64 │ int64 │
├───────┼───────┤
│ 0 │ 10 │
│ 1 │ 11 │
│ 2 │ 12 │
│ 3 │ 13 │
│ 4 │ 14 │
└───────┴───────┘
import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.limit(3).show()
┌───────┐
│ id │
│ int64 │
├───────┤
│ 0 │
│ 1 │
262
DuckDB Documentation
│ 2 │
└───────┘
import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.order("id DESC").limit(3).show()
┌────────┐
│ id │
│ int64 │
├────────┤
│ 999999 │
│ 999998 │
│ 999997 │
└────────┘
import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.project("id + 10 AS id_plus_ten").limit(3).show()
┌─────────────┐
│ id_plus_ten │
│ int64 │
├─────────────┤
│ 10 │
│ 11 │
│ 12 │
└─────────────┘
union(rel) Combine two relations, returning all rows in r1 followed by all rows in r2. The relations must have the same number of
columns.
import duckdb
r1 = duckdb.sql("SELECT * FROM range(5) tbl(id)")
r2 = duckdb.sql("SELECT * FROM range(10, 15) tbl(id)")
r1.union(r2).show()
┌───────┐
│ id │
│ int64 │
├───────┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 10 │
│ 11 │
│ 12 │
│ 13 │
│ 14 │
└───────┘
263
DuckDB Documentation
Result Output
The result of relations can be converted to various types of Python structures, see the result conversion page for more information.
The result of relations can also be directly written to files using the below methods.
• write_csv
• write_parquet
You can create a DuckDB user‑defined function (UDF) out of a Python function so it can be used in SQL queries. Similarly to regular functions,
they need to have a name, a return type and parameter types.
import duckdb
from duckdb.typing import *
from faker import Faker
def random_name():
fake = Faker()
return fake.name()
Creating Functions
To register a Python UDF, simply use the create_function method from a DuckDB connection. Here is the syntax:
import duckdb
con = duckdb.connect()
con.create_function(name, function, argument_type_list, return_type, type, null_handling)
1. name: A string representing the unique name of the UDF within the connection catalog.
2. function: The Python function you wish to register as a UDF.
3. return_type: Scalar functions return one element per row. This parameter specifies the return type of the function.
4. parameters: Scalar functions can operate on one or more columns. This parameter takes a list of column types used as input.
5. type (Optional): DuckDB supports both built‑in Python types and PyArrow Tables. By default, built‑in types are assumed, but you
can specify type = 'arrow' to use PyArrow Tables.
6. null_handling (Optional): By default, null values are automatically handled as Null‑In Null‑Out. Users can specify a desired behavior
for null values by setting null_handling = 'special'.
7. exception_handling (Optional): By default, when an exception is thrown from the Python function, it will be re‑thrown in Python.
Users can disable this behavior, and instead return null, by set this parameter to 'return_null'
8. side_effects (Optional): By default, functions are expected to produce the same result for the same input. If the result of a function
is impacted by any type of randomness, side_effects must be set to True.
To unregister a UDF, you can call the remove_function method with the UDF name:
con.remove_function(name)
264
DuckDB Documentation
Type Annotation
When the function has type annotation it's often possible to leave out all of the optional parameters. Using DuckDBPyType we can im‑
plicitly convert many known types to DuckDBs type system. For example:
import duckdb
duckdb.create_function("my_func", my_function)
duckdb.sql("SELECT my_func(42)")
┌─────────────┐
│ my_func(42) │
│ varchar │
├─────────────┤
│ 42 │
└─────────────┘
If only the parameter list types can be inferred, you'll need to pass in None as argument_type_list.
Null Handling
By default when functions receive a NULL value, this instantly returns NULL, as part of the default NULL‑handling. When this is not desired,
you need to explicitly set this parameter to "special".
import duckdb
from duckdb.typing import *
def dont_intercept_null(x):
return 5
duckdb.remove_function("dont_intercept")
duckdb.create_function("dont_intercept", dont_intercept_null, [BIGINT], BIGINT, null_handling="special")
res = duckdb.sql("SELECT dont_intercept(NULL)").fetchall()
print(res)
# [(5,)]
Exception Handling
By default, when an exception is thrown from the Python function, we'll forward (re‑throw) the exception. If you want to disable this
behavior, and instead return null, you'll need to set this parameter to "return_null"
import duckdb
from duckdb.typing import *
def will_throw():
raise ValueError("ERROR")
265
DuckDB Documentation
except duckdb.InvalidInputException as e:
print(e)
Side Effects
By default DuckDB will assume the created function is a pure function, meaning it will produce the same output when given the same
input. If your function does not follow that rule, for example when your function makes use of randomness, then you will need to mark this
function as having side_effects.
For example, this function will produce a new count for every invocation
count.counter = 0
If we create this function without marking it as having side effects, the result will be the following:
con = duckdb.connect()
con.create_function("my_counter", count, side_effects = False)
res = con.sql("SELECT my_counter() FROM range(10)").fetchall()
print(res)
# [(0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,)]
Which is obviously not the desired result, when we add side_effects = True, the result is as we would expect:
con.remove_function("my_counter")
count.counter = 0
con.create_function("my_counter", count, side_effects = True)
res = con.sql("SELECT my_counter() FROM range(10)").fetchall()
print(res)
# [(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)]
Currently, two function types are supported, native (default) and arrow.
Arrow If the function is expected to receive arrow arrays, set the type parameter to 'arrow'.
This will let the system know to provide arrow arrays of up to STANDARD_VECTOR_SIZE tuples to the function, and also expect an array
of the same amount of tuples to be returned from the function.
Native When the function type is set to native the function will be provided with a single tuple at a time, and expect only a single value
to be returned. This can be useful to interact with Python libraries that don't operate on Arrow, such as faker:
import duckdb
266
DuckDB Documentation
def random_date():
fake = Faker()
return fake.date_between()
Types API
To make the API as easy to use as possible, we have added implicit conversions from existing type objects to a DuckDBPyType instance.
This means that wherever a DuckDBPyType object is expected, it is also possible to provide any of the options listed below.
Python Built‑ins The table below shows the mapping of Python Built‑in types to DuckDB type.
bool BOOLEAN
bytearray BLOB
bytes BLOB
float DOUBLE
int BIGINT
str VARCHAR
Numpy DTypes The table below shows the mapping of Numpy DType to DuckDB type.
bool BOOLEAN
float32 FLOAT
float64 DOUBLE
int16 SMALLINT
int32 INTEGER
int64 BIGINT
int8 TINYINT
uint16 USMALLINT
uint32 UINTEGER
uint64 UBIGINT
uint8 UTINYINT
Nested Types
267
DuckDB Documentation
list[child_type] list type objects map to a LIST type of the child type. Which can also be arbitrarily nested.
import duckdb
from typing import Union
dict[key_type, value_type] dict type objects map to a MAP type of the key type and the value type.
import duckdb
duckdb.typing.DuckDBPyType(dict[str, int])
# MAP(VARCHAR, BIGINT)
{'a': field_one, 'b': field_two, .., 'n': field_n} dict objects map to a STRUCT composed of the keys and
values of the dict.
import duckdb
Union[ type_1 , ... type_n ] typing.Union objects map to a UNION type of the provided types.
import duckdb
from typing import Union
Creation Functions For the built‑in types, you can use the constants defined in duckdb.typing:
DuckDB type
BIGINT
BIT
BLOB
BOOLEAN
DATE
DOUBLE
FLOAT
HUGEINT
INTEGER
INTERVAL
SMALLINT
SQLNULL
TIME_TZ
TIME
TIMESTAMP_MS
268
DuckDB Documentation
DuckDB type
TIMESTAMP_NS
TIMESTAMP_S
TIMESTAMP_TZ
TIMESTAMP
TINYINT
UBIGINT
UHUGEINT
UINTEGER
USMALLINT
UTINYINT
UUID
VARCHAR
For the complex types there are methods available on the DuckDBPyConnection object or the duckdb module. Anywhere a Duck-
DBPyType is accepted, we will also accept one of the type objects that can implicitly convert to a DuckDBPyType.
• child_type: DuckDBPyType
map_type Parameters:
• key_type: DuckDBPyType
• value_type: DuckDBPyType
decimal_type Parameters:
• width: int
• scale: int
union_type Parameters:
string_type Parameters:
• collation: Optional[str]
Expression API
269
DuckDB Documentation
Using this API makes it possible to dynamically build up expressions, which are typically created by the parser from the query string. This
allows you to skip that and have more fine‑grained control over the used expressions.
Below is a list of currently supported expressions that can be created through the API.
Column Expression
import duckdb
import pandas as pd
df = pd.DataFrame({
'a': [1, 2, 3, 4],
'b': [True, None, False, True],
'c': [42, 21, 13, 14]
})
Star Expression
Optionally it's possible to provide an exclude list to filter out columns of the table. This exclude list can contain either strings or
Expressions.
import duckdb
import pandas as pd
df = pd.DataFrame({
'a': [1, 2, 3, 4],
'b': [True, None, False, True],
'c': [42, 21, 13, 14]
})
Constant Expression
270
DuckDB Documentation
import duckdb
import pandas as pd
df = pd.DataFrame({
'a': [1, 2, 3, 4],
'b': [True, None, False, True],
'c': [42, 21, 13, 14]
})
const = duckdb.ConstantExpression('hello')
res = duckdb.df(df).select(const).fetchall()
print(res)
# [('hello',), ('hello',), ('hello',), ('hello',)]
Case Expression
This expression contains a CASE WHEN (...) THEN (...) ELSE (...) END expression. By default ELSE is NULL and it can be
set using .else(value = ...). Additional WHEN (...) THEN (...) blocks can be added with .when(condition = ...,
value = ...).
import duckdb
import pandas as pd
from duckdb import (
ConstantExpression,
ColumnExpression,
CaseExpression
)
df = pd.DataFrame({
'a': [1, 2, 3, 4],
'b': [True, None, False, True],
'c': [42, 21, 13, 14]
})
hello = ConstantExpression('hello')
world = ConstantExpression('world')
case = \
CaseExpression(condition = ColumnExpression('b') == False, value = world) \
.otherwise(hello)
res = duckdb.df(df).select(case).fetchall()
print(res)
# [('hello',), ('hello',), ('world',), ('hello',)]
Function Expression
This expression contains a function call. It can be constructed by providing the function name and an arbitrary amount of Expressions as
arguments.
import duckdb
import pandas as pd
from duckdb import (
ConstantExpression,
ColumnExpression,
FunctionExpression
)
271
DuckDB Documentation
df = pd.DataFrame({
'a': [
'test',
'pest',
'text',
'rest',
]
})
Common Operations
The Expression class also contains many operations that can be applied to any Expression type.
.cast(type: DuckDBPyType)
Applies a cast to the provided type on the expression.
.alias(name: str)
Apply an alias to the expression.
.isin(*exprs: Expression)
Create a IN expression against the provided expressions as the list.
.isnotin(*exprs: Expression)
Create a NOT IN expression against the provided expressions as the list.
Order Operations When expressions are provided to DuckDBPyRelation.order() these take effect:
.asc()
Indicates that this expression should be sorted in ascending order.
.desc()
Indicates that this expression should be sorted in descending order.
.nulls_first()
Indicates that the nulls in this expression should preceed the non‑null values.
.nulls_last()
Indicates that the nulls in this expression should come after the non‑null values.
Spark API
The DuckDB Spark API implements the PySpark API, allowing you to use the familiar Spark API to interact with DuckDB. All statements are
translated to DuckDB's internal plans using our relational API and executed using DuckDB's query engine.
Note. Warning The DuckDB Spark API is currently experimental and features are still missing. We are very interested in feedback.
Please report any functionality that you are missing, either through Discord or on GitHub.
Example
272
DuckDB Documentation
spark = session.builder.getOrCreate()
pandas_df = pd.DataFrame({
'age': [34, 45, 23, 56],
'name': ['Joan', 'Peter', 'John', 'Bob']
})
df = spark.createDataFrame(pandas_df)
df = df.withColumn(
'location', lit('Seattle')
)
res = df.select(
col('age'),
col('location')
).collect()
print(res)
[
Row(age=34, location='Seattle'),
Row(age=45, location='Seattle'),
Row(age=23, location='Seattle'),
Row(age=56, location='Seattle')
]
Contribution Guidelines
Contributions to the experimental Spark API are welcome. When making a contribution, please follow these guidelines:
Unfortunately there are some issues that are either beyond our control or are very elusive / hard to track down. Below is a list of these
issues that you might have to be aware of, depending on your workflow.
When making use of multi threading and fetching results either directly as Numpy arrays or indirectly through a Pandas DataFrame, it might
be necessary to ensure that numpy.core.multiarray is imported. If this module has not been imported from the main thread, and a
different thread during execution attempts to import it this causes either a deadlock or a crash.
When DuckDB is run in Jupyter notebooks or in the IPython shell, the output of the EXPLAIN statement contains hard line breaks (\n):
273
DuckDB Documentation
Out[1]:
┌───────────────┬────────────────────────────────────────────────────────────────────────────────────────────────
│ explain_key │ explain_value
│
│ varchar │ varchar
│
├───────────────┼────────────────────────────────────────────────────────────────────────────────────────────────
│ physical_plan │ ┌───────────────────────────┐\n│ PROJECTION │\n│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│\n│ x … │
└───────────────┴────────────────────────────────────────────────────────────────────────────────────────────────
Out[2]:
┌───────────────────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ x │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ DUMMY_SCAN │
└───────────────────────────┘
Please also check out the Jupyter guide for tips on using Jupyter with JupySQL.
When importing DuckDB on Windows, the Python runtime may return the following error:
import duckdb
ImportError: DLL load failed while importing duckdb: The specified module could not be found.
R API
Installation
The DuckDB R API can be installed using install.packages("duckdb"). Please see the installation page for details.
Reference Manual
The standard DuckDB R API implements the DBI interface for R. If you are not familiar with DBI yet, see here for an introduction.
274
DuckDB Documentation
Startup & Shutdown To use DuckDB, you must first create a connection object that represents the database. The connection object
takes as parameter the database file to read and write from. If the database file does not exist, it will be created (the file extension may be
.db, .duckdb, or anything else). The special value :memory: (the default) can be used to create an in‑memory database. Note that
for an in‑memory database no data is persisted to disk (i.e., all data is lost when you exit the R process). If you would like to connect to an
existing database in read‑only mode, set the read_only flag to TRUE. Read‑only mode is required if multiple R processes want to access
the same database file at the same time.
library("duckdb")
# to start an in-memory database
con <- dbConnect(duckdb())
# or
con <- dbConnect(duckdb(), dbdir = ":memory:")
# to use a database file (not shared between processes)
con <- dbConnect(duckdb(), dbdir = "my-db.duckdb", read_only = FALSE)
# to use a database file (shared between processes)
con <- dbConnect(duckdb(), dbdir = "my-db.duckdb", read_only = TRUE)
Connections are closed implicitly when they go out of scope or if they are explicitly closed using dbDisconnect(). To shut down the
database instance associated with the connection, use dbDisconnect(con, shutdown = TRUE)
Querying DuckDB supports the standard DBI methods to send queries and retrieve result sets. dbExecute() is meant for queries
where no results are expected like CREATE TABLE or UPDATE etc. and dbGetQuery() is meant to be used for queries that produce
results (e.g., SELECT). Below an example.
# create a table
dbExecute(con, "CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)")
# insert two items into the table
dbExecute(con, "INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")
DuckDB also supports prepared statements in the R API with the dbExecute and dbGetQuery methods. Here is an example:
# if you want to reuse a prepared statement multiple times, use dbSendStatement() and dbBind()
stmt <- dbSendStatement(con, "INSERT INTO items VALUES (?, ?, ?)")
dbBind(stmt, list('iphone', 300, 2))
dbBind(stmt, list('android', 3.5, 1))
dbClearResult(stmt)
Note. Warning Do not use prepared statements to insert large amounts of data into DuckDB. See below for better options.
275
DuckDB Documentation
Efficient Transfer
To write a R data frame into DuckDB, use the standard DBI function dbWriteTable(). This creates a table in DuckDB and populates it
with the data frame contents. For example:
It is also possible to ”register” a R data frame as a virtual table, comparable to a SQL VIEW. This does not actually transfer data into DuckDB
yet. Below is an example:
Note. DuckDB keeps a reference to the R data frame after registration. This prevents the data frame from being garbage‑collected.
The reference is cleared when the connection is closed, but can also be cleared manually using the duckdb_unregister()
method.
Also refer to the data import documentation for more options of efficiently importing data.
dbplyr
DuckDB also plays well with the dbplyr / dplyr packages for programmatic query construction from R. Here is an example:
library("duckdb")
library("dplyr")
con <- dbConnect(duckdb())
duckdb_register(con, "flights", nycflights13::flights)
When using dbplyr, CSV and Parquet files can be read using the dplyr::tbl function.
# Summarize the dataset in DuckDB to avoid reading the entire CSV into R's memory
tbl(con, "mtcars.csv") |>
group_by(cyl) |>
summarise(across(disp:wt, .fns = mean)) |>
collect()
# Summarize the dataset in DuckDB to avoid reading 12 Parquet files into R's memory
tbl(con, "read_parquet('dataset/**/*.parquet', hive_partitioning = true)") |>
filter(month == "3") |>
summarise(delay = mean(dep_time, na.rm = TRUE)) |>
collect()
276
DuckDB Documentation
Rust API
Installation
The DuckDB Rust API can be installed from crates.io. Please see the docs.rs for details.
duckdb‑rs is an ergonomic wrapper based on the DuckDB C API, please refer to the README for details.
Startup & Shutdown To use duckdb, you must first initialize a Connection handle using Connection::open(). Connec-
tion::open() takes as parameter the database file to read and write from. If the database file does not exist, it will be created (the file
extension may be .db, .duckdb, or anything else). You can also use Connection::open_in_memory() to create an in‑memory
database. Note that for an in‑memory database no data is persisted to disk (i.e., all data is lost when you exit the process).
You can conn.close() the Connection manually, or just leave it out of scope, we had implement the Drop trait which will automati‑
cally close the underlining db connection for you.
Querying SQL queries can be sent to DuckDB using the execute() method of connections, or we can also prepare the statement and
then query on that.
#[derive(Debug)]
struct Person {
id: i32,
name: String,
data: Option<Vec<u8>>,
}
conn.execute(
"INSERT INTO person (name, data) VALUES (?, ?)",
params![me.name, me.data],
)?;
Appender
The Rust client supports the DuckDB Appender API for bulk inserts. For example:
277
DuckDB Documentation
Swift API
DuckDB offers a Swift API. See the announcement post for details.
Instantiating DuckDB
DuckDB supports both in‑memory and persistent databases. To work with an in‑memory datatabase, run:
Application Example
The rest of the page is based on the example of our announcement post, which uses raw data from NASA's Exoplanet Archive loaded directly
into DuckDB.
Creating an Application‑Specific Type We first create an application‑specific type that we'll use to house our database and connection
and through which we'll eventually define our app‑specific queries.
import DuckDB
Loading a CSV File We load the data from NASA's Exoplanet Archive:
wget https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select+pl_name+,+disc_
year+from+pscomppars&format=csv -O downloaded_exoplanets.csv
Once we have our CSV downloaded locally, we can use the following SQL command to load it as a new table to DuckDB:
278
DuckDB Documentation
Let's package this up as a new asynchronous factory method on our ExoplanetStore type:
import DuckDB
import Foundation
Querying the Database The following example queires DuckDB from within Swift via an async function. This means the callee won't be
blocked while the query is executing. We'll then cast the result columns to Swift native types using DuckDB's ResultSet cast(to:)
family of methods, before finally wrapping them up in a DataFrame from the TabularData framework.
...
import TabularData
extension ExoplanetStore {
279
DuckDB Documentation
GROUP BY disc_year
ORDER BY disc_year
""")
Complete Project For the complete example project, clone the DuckDB Swift repo and open up the runnable app project located in
Examples/SwiftUI/ExoplanetExplorer.xcodeproj.
Wasm
DuckDB Wasm
DuckDB has been compiled to WebAssembly, so it can run inside any browser on any device.
DuckDB‑Wasm offers a layered API, it can be embedded as a JavaScript + WebAssembly library, as a Web shell, or built from source according
to your needs.
Instantiation
Instantiation
cdn(jsdelivr)
280
DuckDB Documentation
webpack
vite
281
DuckDB Documentation
Statically Served
Data Ingestion
DuckDB‑Wasm has multiple ways to import data, depending on the format of the data.
First, the data file is imported into a local file system using register functions (registerEmptyFileBuffer, registerFileBuffer, registerFileHandle,
registerFileText, registerFileURL).
Then, the data file is imported into DuckDB using insert functions (insertArrowFromIPCStream, insertArrowTable, insertCSVFromPath, in‑
sertJSONFromPath) or directly using FROM SQL query (using extensions like Parquet or Wasm‑flavored httpfs).
Data Import
282
DuckDB Documentation
Apache Arrow
// Write EOS
streamInserts.push(c.insertArrowFromIPCStream(EOS, { name: 'streamed' }));
await Promise.all(streamInserts);
CSV
JSON
283
DuckDB Documentation
// From API
const streamResponse = await fetch(` someapi/content.json`);
await db.registerFileBuffer('file.json', new Uint8Array(await streamResponse.arrayBuffer()))
await c.insertJSONFromPath('file.json', { name: 'JSONContent' });
Parquet
httpfs (Wasm‑flavored)
Insert Statement
284
DuckDB Documentation
Query
DuckDB‑Wasm provides functions for querying data. Queries are run sequentially.
First, a connection need to be created by calling connect. Then, queries can be run by calling query or send.
Query Execution
Prepared Statements
// Query
const arrowResult = await conn.query<{ v: arrow.Int }>(`
SELECT * FROM generate_series(1, 100) t(v)
285
DuckDB Documentation
`);
Export Parquet
// Export Parquet
conn.send(` COPY (SELECT * FROM tbl) TO 'result-snappy.parquet' (FORMAT 'parquet');`);
const parquet_buffer = await this._db.copyFileToBuffer('result-snappy.parquet');
Extensions
DuckDB‑Wasm's (dynamic) extension loading is modeled after the regular DuckDB's extension loading, with a few relevant differences due
to the difference in platform.
Format
Extensions in DuckDB are binaries to be dynamically loaded via dlopen. A cryptographical signature is appended to the binary. An exten‑
sion in DuckDB‑Wasm is a regular Wasm file to be dynamically loaded via Emscripten's dlopen. A cryptographical signature is appended
to the Wasm file as a WebAssembly custom section called duckdb_signature. This ensures the file remains a valid WebAssembly file.
Note. Currently, we require this custom section to be the last one, but this can be potentially relaxed in the future.
The INSTALL semantic in native embeddings of DuckDB is to fetch, decompress from gzip and store data in local disk. The LOAD semantic
in native embeddings of DuckDB is to (optionally) perform signature checks and dynamic load the binary with the main DuckDB binary.
In DuckDB‑Wasm, INSTALL is a no‑op given there is no durable cross‑session storage. The LOAD operation will fetch (and decompress on
the fly), perform signature checks and dynamically load via the Emscripten implementation of dlopen.
Autoloading
Autoloading, i.e., the possibility for DuckDB to add extension functionality on‑the‑fly, is enabled by default in DuckDB‑Wasm.
286
DuckDB Documentation
WebAssembly is basically an additional platform, and there might be platform‑specific limitations that make some extensions not able
to match their native capabilities or to perform them in a different way. We will document here relevant differences for DuckDB‑hosted
extensions.
HTTPFS The HTTPFS extension is, at the moment, not available in DuckDB‑Wasm. Https protocol capabilities needs to go through an
additional layer, the browser, which adds both differences and some restrictions to what is doable from native.
Instead, DuckDB‑Wasm has a separate implementation that for most purposes is interchangable, but does not support all use cases (as
it must follow security rules imposed by the browser, such as CORS). Due to this CORS restriction, any requests for data made using the
HTTPFS extension must be to websites that allow (using CORS headers) the website hosting the DuckDB‑Wasm instance to access that data.
The MDN website is a great resource for more information regarding CORS.
Extension Signing
As with regular DuckDB extensions, DuckDB‑Wasm extension are by default checked on LOAD to verify the signature confirm the extension
has not been tampered with. Extension signature verification can be disabled via a configuration option. Signing is a property of the binary
itself, so copying a DuckDB extension (say to serve it from a different location) will still keep a valid signature (e.g., for local development).
Official DuckDB extensions are served at extensions.duckdb.org, and this is also the default value for the default_extension_
repository option. When installing extensions, a relevant URL will be built that will look like extensions.duckdb.org/$duckdb_
version_hash/$duckdb_platform/$name.duckdb_extension.gz.
DuckDB‑Wasm extension are fetched only on load, and the URL will look like: extensions.duckdb.org/duckdb-wasm/$duckdb_
version_hash/$duckdb_platform/$name.duckdb_extension.wasm.
Note that an additional duckdb-wasm is added to the folder structure, and the file is served as a .wasm file.
DuckDB‑Wasm extensions are served pre‑compressed using Brotli compression. While fetched from a browser, extensions will be
transparently uncompressed. If you want to fetch the duckdb-wasm extension manually, you can use curl --compress exten-
sions.duckdb.org/<...>/icu.duckdb_extension.wasm.
287
DuckDB Documentation
As with regular DuckDB, if you use SET custom_extension_repository = some.url.com, subsequent loads will be attempted
at some.url.com/duckdb-wasm/$duckdb_version_hash/$duckdb_platform/$name.duckdb_extension.wasm.
Note that GET requests on the extensions needs to be CORS enabled for a browser to allow the connection.
Tooling
Both DuckDB‑Wasm and its extensions have been compiled using latest packaged Emscripten toolchain.
ADBC API
Arrow Database Connectivity (ADBC), similarly to ODBC and JDBC, is a C‑style API that enables code portability between different database
systems. This allows developers to effortlessly build applications that communicate with database systems without using code specific to
that system. The main difference between ADBC and ODBC/JDBC is that ADBC uses Arrow to transfer data between the database system and
the application. DuckDB has an ADBC driver, which takes advantage of the zero‑copy integration between DuckDB and Arrow to efficiently
transfer data.
Please refer to the ADBC documentation page for a more extensive discussion on ADBC and a detailed API explanation.
Implemented Functionality
The DuckDB‑ADBC driver implements the full ADBC specification, with the exception of the ConnectionReadPartition and State-
mentExecutePartitions functions. Both of these functions exist to support systems that internally partition the query results, which
does not apply to DuckDB. In this section, we will describe the main functions that exist in ADBC, along with the arguments they take and
provide examples for each function.
288
DuckDB Documentation
Connection A set of functions that create and destroy a connection to interact with a database.
A set of functions that retrieve metadata about the database. In general, these functions will return Arrow objects, specifically an ArrowAr‑
rayStream.
A set of functions with transaction semantics for the connection. By default, all connections start with auto‑commit mode on, but this can
be turned off via the ConnectionSetOption function.
289
DuckDB Documentation
Statement Statements hold state related to query execution. They represent both one‑off queries and prepared statements. They can
be reused; however, doing so will invalidate prior result sets from that statement.
The functions used to create, destroy, and set options for a statement:
290
DuckDB Documentation
Examples
Regardless of the programming language being used, there are two database options which will be required to utilize ADBC with DuckDB.
The first one is the driver, which takes a path to the DuckDB library. The second option is the entrypoint, which is an exported function
from the DuckDB‑ADBC driver that initializes all the ADBC functions. Once we have configured these two options, we can optionally set the
path option, providing a path on disk to store our DuckDB database. If not set, an in‑memory database is created. After configuring all the
necessary options, we can proceed to initialize our database. Below is how you can do so with various different language environments.
C++ We begin our C++ example by declaring the essential variables for querying data through ADBC. These variables include Error,
Database, Connection, Statement handling, and an Arrow Stream to transfer data between DuckDB and the application.
AdbcError adbc_error;
AdbcDatabase adbc_database;
AdbcConnection adbc_connection;
AdbcStatement adbc_statement;
ArrowArrayStream arrow_stream;
We can then initialize our database variable. Before initializing the database, we need to set the driver and entrypoint
options as mentioned above. Then we set the path option and initialize the database. With the example below, the string
"path/to/libduckdb.dylib" should be the path to the dynamic library for DuckDB. This will be .dylib on macOS, and
.so on Linux.
AdbcDatabaseNew(&adbc_database, &adbc_error);
AdbcDatabaseSetOption(&adbc_database, "driver", "path/to/libduckdb.dylib", &adbc_error);
AdbcDatabaseSetOption(&adbc_database, "entrypoint", "duckdb_adbc_init", &adbc_error);
// By default, we start an in-memory database, but you can optionally define a path to store it on disk.
AdbcDatabaseSetOption(&adbc_database, "path", "test.db", &adbc_error);
AdbcDatabaseInit(&adbc_database, &adbc_error);
After initializing the database, we must create and initialize a connection to it.
AdbcConnectionNew(&adbc_connection, &adbc_error);
AdbcConnectionInit(&adbc_connection, &adbc_database, &adbc_error);
291
DuckDB Documentation
We can now initialize our statement and run queries through our connection. After the AdbcStatementExecuteQuery the arrow_
stream is populated with the result.
Besides running queries, we can also ingest data via arrow_streams. For this we need to set an option with the table name we want to
insert to, bind the stream and then execute the query.
Python The first thing to do is to use pip and install the ADBC Driver manager. You will also need to install the pyarrow to directly
access Apache Arrow formatted result sets (such as using fetch_arrow_table).
Note. For details on the adbc_driver_manager package, see the adbc_driver_manager package documentation.
As with C++, we need to provide initialization options consisting of the location of the libduckdb shared object and entrypoint function.
Notice that the path argument for DuckDB is passed in through the db_kwargs dictionary.
import adbc_driver_duckdb.dbapi
Alongside fetch_arrow_table, other methods from DBApi are also implemented on the cursor, such as fetchone and fetchall.
Data can also be ingested via arrow_streams. We just need to set options on the statement to bind the stream of data and execute the
query.
import adbc_driver_duckdb.dbapi
import pyarrow
data = pyarrow.record_batch(
[[1, 2, 3, 4], ["a", "b", "c", "d"]],
names = ["ints", "strs"],
)
ODBC
The ODBC (Open Database Connectivity) is a C‑style API that provides access to different flavors of Database Management Systems (DBMSs).
The ODBC API consists of the Driver Manager (DM) and the ODBC drivers.
The DM is part of the system library, e.g., unixODBC, which manages the communications between the user applications and the ODBC
drivers. Typically, applications are linked against the DM, which uses Data Source Name (DSN) to look up the correct ODBC driver.
292
DuckDB Documentation
The ODBC driver is a DBMS implementation of the ODBC API, which handles all the internals of that DBMS.
The DM maps user application calls of ODBC functions to the correct ODBC driver that performs the specified function and returns the
proper values.
DuckDB supports the ODBC version 3.0 according to the Core Interface Conformance.
We release the ODBC driver as assets for Linux and Windows. Users can download them from the Latest Release of DuckDB.
Operating Systems
A driver manager is required to manage communication between applications and the ODBC driver. We tested and support unixODBC
that is a complete ODBC driver manager for Linux. Users can install it from the command line:
Debian Flavors
Fedora Flavors
DuckDB releases the ODBC driver as asset. For linux, download it from ODBC Linux Asset that contains the following artifacts:
mkdir duckdb_odbc
unzip duckdb_odbc-linux-amd64.zip -d duckdb_odbc
The unixodbc_setup.sh script aids the configuration of the DuckDB ODBC Driver. It is based on the unixODBC package that provides
some commands to handle the ODBC setup and test like odbcinst and isql.
In a terminal window, change to the duckdb_odbc permanent directory, and run the following commands with level options -u or -s
either to configure DuckDB ODBC.
293
DuckDB Documentation
User‑Level ODBC Setup (‑u) The -u option based on the user home directory to setup the ODBC init files.
./unixodbc_setup.sh -u
System‑Level ODBC setup (‑s) The ‑s changes the system level files that will be visible for all users, because of that it requires root
privileges.
sudo unixodbc_setup.sh -s
Show Usage (‑‑help) The option --help shows the usage of unixodbc_setup.sh that provides alternative options for a customer
configuration, like -db and -D.
unixodbc_setup.sh --help
Level:
-s: System-level, using 'sudo' to configure DuckDB ODBC at the system-level, changing the files:
/etc/odbc[inst].ini
-u: User-level, configuring the DuckDB ODBC at the user-level, changing the files: ~/.odbc[inst].ini.
Options:
-db database_path>: the DuckDB database file path, the default is ':memory:' if not provided.
-D driver_path: the driver file path (i.e., the path for libduckdb_odbc.so), the default is using the
base script directory
The ODBC setup on Linux is based on files, the well‑known .odbc.ini and .odbcinst.ini. These files can be placed at the system
/etc directory or at the user home directory /home/ user (shortcut as ~/). The DM prioritizes the user configuration files and then
the system files.
The .odbc.ini File The .odbc.ini contains the DSNs for the drivers, which can have specific knobs.
[DuckDB]
Driver = DuckDB Driver
Database = :memory:
Driver: it describes the driver's name, and other configurations will be placed at the .odbcinst.ini.
Database: it describes the database name used by DuckDB, and it can also be a file path to a .db in the system.
The .odbcinst.ini File The .odbcinst.ini contains general configurations for the ODBC installed drivers in the system. A driver
section starts with the driver name between brackets, and then it follows specific configuration knobs belonging to that driver.
294
DuckDB Documentation
[ODBC]
Trace = yes
TraceFile = /tmp/odbctrace
[DuckDB Driver]
Driver = /home/ user /duckdb_odbc/libduckdb_odbc.so
Trace: it enables the ODBC trace file using the option yes.
TraceFile: the absolute system file path for the ODBC trace file.
The Microsoft Windows requires an ODBC Driver Manager to manage communication between applications and the ODBC drivers. The DM
on Windows is provided in a DLL file odbccp32.dll, and other files and tools. For detailed information checkout out the Common ODBC
Component Files.
DuckDB releases the ODBC driver as asset. For Windows, download it from Windows Asset that contains the following artifacts:
duckdb_odbc_setup.dll: a setup DLL used by the Windows ODBC Data Source Administrator tool.
mkdir duckdb_odbc
unzip duckdb_odbc-linux-amd64.zip -d duckdb_odbc
The odbc_install.exe aids the configuration of the DuckDB ODBC Driver on Windows. It depends on the Odbccp32.dll that pro‑
vides functions to configure the ODBC registry entries.
Windows administrator privileges is required, in case of a non‑administrator a User Account Control shall display:
The odbc_install.exe adds a default DSN configuration into the ODBC registries with a default database :memory:.
DSN Windows Setup After the installation, it is possible to change the default DSN configuration or add a new one using the Windows
ODBC Data Source Administrator tool odbcad32.exe.
295
DuckDB Documentation
Default DuckDB DSN The newly installed DSN is visible on the System DSN in the Windows ODBC Data Source Administrator tool:
Changing DuckDB DSN When selecting the default DSN (i.e., DuckDB) or adding a new configuration, the following setup window will
display:
296
DuckDB Documentation
This window allows you to set the DSN and the database file path associated with that DSN.
There are two ways to configure the ODBC driver, either by altering the registry keys as detailed below, or by connecting with SQLDriver-
Connect. A combination of the two is also possible.
Furthermore, the ODBC driver supports all the configuration options included in DuckDB.
Note. If a configuration is set in both the connection string passed to SQLDriverConnect and in the odbc.ini file, the one
passed to SQLDriverConnect will take precedence.
Registry Keys The ODBC setup on Windows is based on registry keys (see Registry Entries for ODBC Components). The ODBC entries can
be placed at the current user registry key (HKCU) or the system registry key (HKLM).
We have tested and used the system entries based on HKLM->SOFTWARE->ODBC. The odbc_install.exe changes this entry that has
two subkeys: ODBC.INI and ODBCINST.INI.
The ODBC.INI is where users usually insert DSN registry entries for the drivers.
For example, the DSN registry for DuckDB would look like this:
297
DuckDB Documentation
The ODBCINST.INI contains one entry for each ODBC driver and other keys predefined for Windows ODBC configuration.
A driver manager is required to manage communication between applications and the ODBC driver. We tested and support unixODBC
that is a complete ODBC driver manager for macOS (and Linux). Users can install it from the command line:
Brew
DuckDB releases the ODBC driver as asset. For macOS, download it from the ODBC macOS asset that contains the following artifacts:
libduckdb_odbc.dylib: the DuckDB ODBC driver compiled to macOS (with Intel and Apple Silicon support).
mkdir duckdb_odbc
unzip duckdb_odbc-osx-universal.zip -d duckdb_odbc
There are two ways to configure the ODBC driver, either by initializing the configuration files listed below, or by connecting with
SQLDriverConnect. A combination of the two is also possible.
Furthermore, the ODBC driver supports all the configuration options included in DuckDB.
Note. If a configuration is set in both the connection string passed to SQLDriverConnect and in the odbc.ini file, the one
passed to SQLDriverConnect will take precedence.
298
DuckDB Documentation
The odbc.ini or .odbc.ini File The .odbc.ini contains the DSNs for the drivers, which can have specific knobs.
[DuckDB]
Driver = DuckDB Driver
Database=:memory:
access_mode=read_only
allow_unsigned_extensions=true
The .odbcinst.ini File The .odbcinst.ini contains general configurations for the ODBC installed drivers in the system. A driver
section starts with the driver name between brackets, and then it follows specific configuration knobs belonging to that driver.
[ODBC]
Trace = yes
TraceFile = /tmp/odbctrace
[DuckDB Driver]
Driver = /User/ user /duckdb_odbc/libduckdb_odbc.dylib
After the configuration, for validate the installation, it is possible to use an odbc client. unixODBC use a command line tool called isql.
isql DuckDB
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| echo [string] |
| quit |
| |
+---------------------------------------+
+------------+
| 42 |
+------------+
| 42 |
+------------+
299
DuckDB Documentation
SQLRowCount returns -1
1 rows fetched
300
Configuration
Configuration
DuckDB has a number of configuration options that can be used to change the behavior of the system.
The configuration options can be set using either the SET statement or the PRAGMA statement. They can be reset to their original values
using the RESET statement. The values of configuration options can be queried via the current_setting() scalar function or using
the duckdb_settings() table function.
Examples
┌─────────┐
│ threads │
│ int64 │
├─────────┤
│ 10 │
└─────────┘
┌─────────┬─────────┬─────────────────────────────────────────────────┬────────────┐
│ name │ value │ description │ input_type │
│ varchar │ varchar │ varchar │ varchar │
├─────────┼─────────┼─────────────────────────────────────────────────┼────────────┤
│ threads │ 10 │ The number of total threads used by the system. │ BIGINT │
└─────────┴─────────┴─────────────────────────────────────────────────┴────────────┘
Secrets Manager
DuckDB has a Secrets manager, which provides a unified user interface for secrets across all backends (e.g., AWS S3) that use them.
301
DuckDB Documentation
Configuration Reference
302
DuckDB Documentation
303
DuckDB Documentation
ordered_aggregate_ The number of rows to accumulate before sorting, used for UBIGINT 262144
threshold tuning
password The password to use. Ignored for legacy compatibility. VARCHAR NULL
perfect_ht_threshold Threshold in bytes for when to use a perfect hash table BIGINT 12
(default: 12)
pivot_filter_ The threshold to switch from using filtered aggregates to BIGINT 10
threshold LIST with a dedicated pivot operator
pivot_limit The maximum number of pivot columns in a pivot BIGINT 100000
statement (default: 100000)
prefer_range_joins Force use of range joins with mixed predicates BOOLEAN false
preserve_identifier_ Whether or not to preserve the identifier case, instead of BOOLEAN true
case always lowercasing all non‑quoted identifiers
preserve_insertion_ Whether or not to preserve insertion order. If set to false the BOOLEAN true
order system is allowed to re‑order any results that do not contain
ORDER BY clauses.
profile_output, The file to which profile output should be saved, or empty VARCHAR
profiling_output to print to the terminal
profiling_mode The profiling mode (STANDARD or DETAILED) VARCHAR NULL
progress_bar_time Sets the time (in milliseconds) how long a query needs to BIGINT 2000
take before we start printing a progress bar
s3_access_key_id S3 Access Key ID VARCHAR
s3_endpoint S3 Endpoint (empty for default endpoint) VARCHAR
s3_region S3 Region (default us‑east‑1) VARCHAR us-east-1
s3_secret_access_key S3 Access Key VARCHAR
s3_session_token S3 Session Token VARCHAR
s3_uploader_max_ S3 Uploader max filesize (between 50GB and 5TB, default VARCHAR 800GB
filesize 800GB)
s3_uploader_max_ S3 Uploader max parts per file (between 1 and 10000, UBIGINT 10000
parts_per_file default 10000)
s3_uploader_thread_ S3 Uploader global thread limit (default 50) UBIGINT 50
limit
s3_url_ Disable Globs and Query Parameters on S3 URLs BOOLEAN 0
compatibility_mode
s3_url_style S3 URL style ('vhost' (default) or 'path') VARCHAR vhost
s3_use_ssl S3 use SSL (default true) BOOLEAN 1
schema Sets the default search schema. Equivalent to setting VARCHAR main
search_path to a single value.
search_path Sets the default catalog search path as a comma‑separated VARCHAR
list of values
secret_directory Set the directory to which persistent secrets are stored VARCHAR ~/.duckdb/stored_
secrets
temp_directory Set the directory to which to write temp files VARCHAR
304
DuckDB Documentation
threads, worker_ The number of total threads used by the system. BIGINT # Cores
threads
username, user The username to use. Ignored for legacy compatibility. VARCHAR NULL
Pragmas
The PRAGMA statement is an SQL extension adopted by DuckDB from SQLite. PRAGMA statements can be issued in a similar manner to reg‑
ular SQL statements. PRAGMA commands may alter the internal state of the database engine, and can influence the subsequent execution
or behavior of the engine.
PRAGMA statements that assign a value to an option can also be issued using the SET statement and the value of an option can be retrieved
using SELECT current_setting(option_name).
PRAGMA database_list;
PRAGMA show_tables;
PRAGMA show_tables_expanded;
PRAGMA functions;
PRAGMA table_info('table_name');
CALL pragma_table_info('table_name');
table_info returns information about the columns of the table with name table_name. The exact format of the table returned is given
below:
To also show table structure, but in a slightly different format (included for compatibility):
PRAGMA show('table_name');
305
DuckDB Documentation
Memory Limit Set the memory limit for the buffer manager:
Note. Warning The specified memory limit is only applied to the buffer manager. For most queries, the buffer manager handles
the majority of the data processed. However, certain in‑memory data structures such as vectors and query results are allocated
outside of the buffer manager. Additionally, aggregate functions with complex state (e.g., list, mode, quantile, string_agg,
and approx functions) use memory outside of the buffer manager. Therefore, the actual memory consumption can be higher than
the specified memory limit.
SET threads = 4;
Database Size Get the file and memory size of each database:
SET database_size;
CALL pragma_database_size();
database_size returns information about the file and memory size of each database. The column types of the returned results are
given below:
PRAGMA collations;
Implicit Casting to VARCHAR Prior to version 0.10.0, DuckDB would automatically allow any type to be implicitly cast to VARCHAR
during function binding. As a result it was possible to e.g., compute the substring of an integer without using an implicit cast. For version
v0.10.0 and later an explicit cast is needed instead. To revert to the old behaviour that performs implicit casting, set the old_implicit_
casting variable to true.
Default Ordering for NULLs Set the default ordering for NULLs to be either NULLS FIRST or NULLS LAST:
306
DuckDB Documentation
PRAGMA version;
CALL pragma_version();
Platform platform returns an identifier for the platform the current DuckDB executable has been compiled for, e.g., osx_arm64. The
format of this identifier matches the platform name as described on the extension loading explainer.
PRAGMA platform;
CALL pragma_platform();
PRAGMA enable_progress_bar;
PRAGMA disable_progress_bar;
Profiling
PRAGMA enable_profiling;
PRAGMA enable_profile;
Profiling Format The format of the resulting profiling information can be specified as either json, query_tree, or query_tree_
optimizer. The default format is query_tree, which prints the physical operator tree together with the timings and cardinalities of
each operator in the tree to the screen.
PRAGMA disable_profiling;
PRAGMA disable_profile;
Profiling Output By default, profiling information is printed to the console. However, if you prefer to write the profiling information to a
file the PRAGMA profiling_output can be used to write to a specified file. Note that the file contents will be overwritten for every
new query that is issued, hence the file will only contain the profiling information of the last query that is run.
307
DuckDB Documentation
Profiling Mode By default, a limited amount of profiling information is provided (standard). For more details, use the detailed profiling
mode by setting profiling_mode to detailed. The output of this mode shows how long it takes to apply certain optimizers on the
query tree and how long physical planning takes.
PRAGMA disable_optimizer;
PRAGMA enable_optimizer;
Explain Plan Output The output of EXPLAIN output can be configured to show only the physical plan. This is the default configura‑
tion.
Full‑Text Search Indexes The create_fts_index and drop_fts_index options are only available when the fts extension is
loaded. Their usage is documented on the Full‑Text Search extension page.
PRAGMA verify_external;
PRAGMA disable_verify_external;
Verification of Round‑Trip Capabilities Enable verification of round‑trip capabilities for supported logical plans:
PRAGMA verify_serializer;
PRAGMA disable_verify_serializer;
PRAGMA enable_object_cache;
PRAGMA disable_object_cache;
308
DuckDB Documentation
Checkpoint
Force Checkpoint When CHECKPOINT is called when no changes are made, force a checkpoint regardless.
PRAGMA force_checkpoint;
Checkpoint on Shutdown Run a CHECKPOINT on successful shutdown and delete the WAL, to leave only a single database file behind:
PRAGMA enable_checkpoint_on_shutdown;
PRAGMA disable_checkpoint_on_shutdown;
Progress Bar Enable printing of the progress bar (if it's possible):
PRAGMA enable_print_progress_bar;
PRAGMA disable_print_progress_bar;
Temp Directory for Spilling Data to Disk By default, DuckDB uses a temporary directory named database_file_name .tmp to
spill to disk, located in the same directory as the database file. To change this, use:
PRAGMA storage_info('table_name');
CALL pragma_storage_info('table_name');
This call returns the following information for the given table:
row_group_id BIGINT
column_name VARCHAR
column_id BIGINT
column_path VARCHAR
segment_id BIGINT
segment_type VARCHAR
start BIGINT The start row id of this chunk
count BIGINT The amount of entries in this storage chunk
compression VARCHAR Compression type used for this column ‑ see blog post
stats VARCHAR
has_updates BOOLEAN
persistent BOOLEAN false if temporary table
block_id BIGINT empty unless persistent
block_offset BIGINT empty unless persistent
309
DuckDB Documentation
Show Databases The following statement is equivalent to the SHOW DATABASES statement:
PRAGMA show_databases;
User Agent The following statement returns the user agent information, e.g., duckdb/v0.10.0(osx_arm64).
PRAGMA user_agent;
Metadata Information The following statement returns information on the metadata store (block_id, total_blocks, free_
blocks, and free_list).
PRAGMA metadata_info;
Selectively Disabling Optimizers The disabled_optimizers option allows selectively disabling optimization steps. For example,
to disable filter_pushdown and statistics_propagation, run:
The available optimizations can be queried using the duckdb_optimizers() table function.
Note. Warning The disabled_optimizers option should only be used for debugging performance issues and should be
avoided in production.
Returning Errors as JSON The errors_as_json option can be set to obtain error information in raw JSON format. For certain errors,
extra information or decomposed information is provided for easier machine processing. For example:
{
"exception_type":"Catalog",
"exception_message":"Table with name nonexistent_tbl does not exist!\nDid you mean
\"temp.information_schema.tables\"?",
"name":"nonexistent_tbl",
"candidates":"temp.information_schema.tables",
"position":"14",
"type":"Table",
"error_subtype":"MISSING_ENTRY"
}
Query Verification (for Development) The following PRAGMAs are mostly used for development and internal testing.
PRAGMA enable_verification;
PRAGMA disable_verification;
PRAGMA verify_parallelism;
PRAGMA disable_verify_parallelism;
310
DuckDB Documentation
Secrets Manager
The Secrets manager provides a unified user interface for secrets across all backends that use them. Secrets can be scoped, so different
storage prefixes can have different secrets, allowing for example to join data across organizations in a single query. Secrets can also be
persisted, so that they do not need to be specified every time DuckDB is launched.
Note. Warning Persistent secrets are stored in unencrypted binary format on the disk.
Secrets
Types of Secrets Secrets are typed, their type identifies which service they are for. Currently, the following cloud services are available:
For each type, there are one or more ”secret providers” that specify how the secret is created. Secrets can also have an optional scope,
which is a file path prefix that the secret applies to. When fetching a secret for a path, the secret scopes are compared to the path, returning
the matching secret for the path. In the case of multiple matching secrets, the longest prefix is chosen.
Creating a Secret Secrets can be created using the CREATE SECRET SQL statement. Secrets can be temporary or persistent. Tem‑
porary secrets are used by default – and are stored in‑memory for the life span of the DuckDB instance similar to how settings worked
previously. Persistent secrets are stored in unencrypted binary format in the ~/.duckdb/stored_secrets directory. On startup of
DuckDB, persistent secrets are read from this directory and automatically loaded.
Secret Providers To create a secret, a Secret Provider needs to be used. A Secret Provider is a mechanism through which a secret
is generated. To illustrate this, for the S3, GCS, R2, and AZURE secret types, DuckDB currently supports two providers: CONFIG and
CREDENTIAL_CHAIN. The CONFIG provider requires the user to pass all configuration information into the CREATE SECRET, whereas
the CREDENTIAL_CHAIN provider will automatically try to fetch credentials. When no Secret Provider is specified, the CONFIG provider
is used. For more details on how to create secrets using different providers checkout the respective pages on httpfs and azure
Temporary Secrets To create a temporary unscoped secret to access S3, we can now use the following:
CREATE SECRET (
TYPE S3,
KEY_ID 'mykey',
SECRET 'mysecret',
REGION 'myregion'
);
Note that we implicitly use the default CONFIG secret provider here.
Persistent Secrets In order to persist secrets between DuckDB database instances, we can now use the CREATE PERSISTENT SECRET
command, e.g.:
311
DuckDB Documentation
Deleting Secrets Secrets can be deleted using the DROP SECRET statement, e.g.:
Creating Multiple Secrets for the Same Service Type If two secrets exist for a service type, the scope can be used to decide which one
should be used. For example:
Now, if the user queries something from s3://my-other-bucket/something, secret secret2 will be chosen automatically for
that request. To see which secret is being used, the which_secret scalar function can be used, which takes a path and a secret type as
parameters:
Listing Secrets Secrets can be listed using the built‑in table‑producing function, e.g., by using the duckdb_secrets() table func‑
tion:
FROM duckdb_secrets();
312
SQL
SQL Introduction
Here we provide an overview of how to perform simple operations in SQL. This tutorial is only intended to give you an introduction and is
in no way a complete tutorial on SQL. This tutorial is adapted from the PostgreSQL tutorial.
In the examples that follow, we assume that you have installed the DuckDB Command Line Interface (CLI) shell. See the installation page
for information on how to install the CLI.
Concepts
DuckDB is a relational database management system (RDBMS). That means it is a system for managing data stored in relations. A relation
is essentially a mathematical term for a table.
Each table is a named collection of rows. Each row of a given table has the same set of named columns, and each column is of a specific
data type. Tables themselves are stored inside schemas, and a collection of schemas constitutes the entire database that you can access.
You can create a new table by specifying the table name, along with all column names and their types:
You can enter this into the shell with the line breaks. The command is not terminated until the semicolon.
White space (i.e., spaces, tabs, and newlines) can be used freely in SQL commands. That means you can type the command aligned differ‑
ently than above, or even all on one line. Two dash characters (--) introduce comments. Whatever follows them is ignored up to the end
of the line. SQL is case insensitive about key words and identifiers.
In the SQL command, we first specify the type of command that we want to perform: CREATE TABLE. After that follows the parameters
for the command. First, the table name, weather, is given. Then the column names and column types follow.
city VARCHAR specifies that the table has a column called city that is of type VARCHAR. VARCHAR specifies a data type that can store
text of arbitrary length. The temperature fields are stored in an INTEGER type, a type that stores integer numbers (i.e., whole numbers
without a decimal point). REAL columns store single precision floating‑point numbers (i.e., numbers with a decimal point). DATE stores a
date (i.e., year, month, day combination). DATE only stores the specific day, not a time associated with that day.
DuckDB supports the standard SQL types INTEGER, SMALLINT, REAL, DOUBLE, DECIMAL, CHAR(n), VARCHAR(n), DATE, TIME and
TIMESTAMP.
The second example will store cities and their associated geographical location:
313
DuckDB Documentation
lon DECIMAL
);
Finally, it should be mentioned that if you don't need a table any longer or want to recreate it differently you can remove it using the
following command:
INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
Constants that are not numeric values (e.g., text and dates) must be surrounded by single quotes (''), as in the example. Input dates for
the date type must be formatted as 'YYYY-MM-DD'.
The syntax used so far requires you to remember the order of the columns. An alternative syntax allows you to list the columns explicitly:
You can list the columns in a different order if you wish or even omit some columns, e.g., if the prcp is unknown:
Many developers consider explicitly listing the columns better style than relying on the order implicitly.
Please enter all the commands shown above so you have some data to work with in the following sections.
You could also have used COPY to load large amounts of data from CSV files. This is usually faster because the COPY command is optimized
for this application while allowing less flexibility than INSERT. An example with weather.csv would be:
COPY weather
FROM 'weather.csv';
Where the file name for the source file must be available on the machine running the process. There are many other ways of loading data
into DuckDB, see the corresponding documentation section for more information.
Querying a Table
To retrieve data from a table, the table is queried. A SQL SELECT statement is used to do this. The statement is divided into a select list
(the part that lists the columns to be returned), a table list (the part that lists the tables from which to retrieve the data), and an optional
qualification (the part that specifies any restrictions). For example, to retrieve all the rows of table weather, type:
SELECT *
FROM weather;
Here * is a shorthand for ”all columns”. So the same result would be had with:
314
DuckDB Documentation
┌───────────────┬─────────┬─────────┬───────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
│ varchar │ int32 │ int32 │ float │ date │
├───────────────┼─────────┼─────────┼───────┼────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
│ San Francisco │ 43 │ 57 │ 0.0 │ 1994-11-29 │
│ Hayward │ 37 │ 54 │ │ 1994-11-29 │
└───────────────┴─────────┴─────────┴───────┴────────────┘
You can write expressions, not just simple column references, in the select list. For example, you can do:
┌───────────────┬──────────┬────────────┐
│ city │ temp_avg │ date │
│ varchar │ double │ date │
├───────────────┼──────────┼────────────┤
│ San Francisco │ 48.0 │ 1994-11-27 │
│ San Francisco │ 50.0 │ 1994-11-29 │
│ Hayward │ 45.5 │ 1994-11-29 │
└───────────────┴──────────┴────────────┘
Notice how the AS clause is used to relabel the output column. (The AS clause is optional.)
A query can be ”qualified” by adding a WHERE clause that specifies which rows are wanted. The WHERE clause contains a Boolean (truth
value) expression, and only rows for which the Boolean expression is true are returned. The usual Boolean operators (AND, OR, and NOT)
are allowed in the qualification. For example, the following retrieves the weather of San Francisco on rainy days:
SELECT *
FROM weather
WHERE city = 'San Francisco' AND prcp > 0.0;
Result:
┌───────────────┬─────────┬─────────┬───────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
│ varchar │ int32 │ int32 │ float │ date │
├───────────────┼─────────┼─────────┼───────┼────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
└───────────────┴─────────┴─────────┴───────┴────────────┘
You can request that the results of a query be returned in sorted order:
SELECT *
FROM weather
ORDER BY city;
┌───────────────┬─────────┬─────────┬───────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
│ varchar │ int32 │ int32 │ float │ date │
├───────────────┼─────────┼─────────┼───────┼────────────┤
│ Hayward │ 37 │ 54 │ │ 1994-11-29 │
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
│ San Francisco │ 43 │ 57 │ 0.0 │ 1994-11-29 │
└───────────────┴─────────┴─────────┴───────┴────────────┘
In this example, the sort order isn't fully specified, and so you might get the San Francisco rows in either order. But you'd always get the
results shown above if you do:
315
DuckDB Documentation
SELECT *
FROM weather
ORDER BY city, temp_lo;
You can request that duplicate rows be removed from the result of a query:
┌───────────────┐
│ city │
│ varchar │
├───────────────┤
│ Hayward │
│ San Francisco │
└───────────────┘
Here again, the result row ordering might vary. You can ensure consistent results by using DISTINCT and ORDER BY together:
Thus far, our queries have only accessed one table at a time. Queries can access multiple tables at once, or access the same table in such a
way that multiple rows of the table are being processed at the same time. A query that accesses multiple rows of the same or different tables
at one time is called a join query. As an example, say you wish to list all the weather records together with the location of the associated
city. To do that, we need to compare the city column of each row of the weather table with the name column of all rows in the cities
table, and select the pairs of rows where these values match.
SELECT *
FROM weather, cities
WHERE city = name;
┌───────────────┬─────────┬─────────┬───────┬────────────┬───────────────┬───────────────┬───────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │ name │ lat │ lon │
│ varchar │ int32 │ int32 │ float │ date │ varchar │ decimal(18,3) │ decimal(18,3) │
├───────────────┼─────────┼─────────┼───────┼────────────┼───────────────┼───────────────┼───────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │ San Francisco │ -194.000 │ 53.000 │
│ San Francisco │ 43 │ 57 │ 0.0 │ 1994-11-29 │ San Francisco │ -194.000 │ 53.000 │
└───────────────┴─────────┴─────────┴───────┴────────────┴───────────────┴───────────────┴───────────────┘
• There is no result row for the city of Hayward. This is because there is no matching entry in the cities table for Hayward, so the
join ignores the unmatched rows in the weather table. We will see shortly how this can be fixed.
• There are two columns containing the city name. This is correct because the lists of columns from the weather and cities tables
are concatenated. In practice this is undesirable, though, so you will probably want to list the output columns explicitly rather than
using *:
┌───────────────┬─────────┬─────────┬───────┬────────────┬───────────────┬───────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │ lon │ lat │
│ varchar │ int32 │ int32 │ float │ date │ decimal(18,3) │ decimal(18,3) │
├───────────────┼─────────┼─────────┼───────┼────────────┼───────────────┼───────────────┤
316
DuckDB Documentation
Since the columns all had different names, the parser automatically found which table they belong to. If there were duplicate column
names in the two tables you'd need to qualify the column names to show which one you meant, as in:
It is widely considered good style to qualify all column names in a join query, so that the query won't fail if a duplicate column name is later
added to one of the tables.
Join queries of the kind seen thus far can also be written in this alternative form:
SELECT *
FROM weather
INNER JOIN cities ON weather.city = cities.name;
This syntax is not as commonly used as the one above, but we show it here to help you understand the following topics.
Now we will figure out how we can get the Hayward records back in. What we want the query to do is to scan the weather table and for
each row to find the matching cities row(s). If no matching row is found we want some ”empty values” to be substituted for the cities
table's columns. This kind of query is called an outer join. (The joins we have seen so far are inner joins.) The command looks like this:
SELECT *
FROM weather
LEFT OUTER JOIN cities ON weather.city = cities.name;
┌───────────────┬─────────┬─────────┬───────┬────────────┬───────────────┬───────────────┬───────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │ name │ lat │ lon │
│ varchar │ int32 │ int32 │ float │ date │ varchar │ decimal(18,3) │ decimal(18,3) │
├───────────────┼─────────┼─────────┼───────┼────────────┼───────────────┼───────────────┼───────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │ San Francisco │ -194.000 │ 53.000 │
│ San Francisco │ 43 │ 57 │ 0.0 │ 1994-11-29 │ San Francisco │ -194.000 │ 53.000 │
│ Hayward │ 37 │ 54 │ │ 1994-11-29 │ │ │ │
└───────────────┴─────────┴─────────┴───────┴────────────┴───────────────┴───────────────┴───────────────┘
This query is called a left outer join because the table mentioned on the left of the join operator will have each of its rows in the output
at least once, whereas the table on the right will only have those rows output that match some row of the left table. When outputting a
left‑table row for which there is no right‑table match, empty (null) values are substituted for the right‑table columns.
Aggregate Functions
Like most other relational database products, DuckDB supports aggregate functions. An aggregate function computes a single result from
multiple input rows. For example, there are aggregates to compute the count, sum, avg (average), max (maximum) and min (minimum)
over a set of rows.
SELECT max(temp_lo)
FROM weather;
┌──────────────┐
│ max(temp_lo) │
│ int32 │
├──────────────┤
│ 46 │
└──────────────┘
317
DuckDB Documentation
If we wanted to know what city (or cities) that reading occurred in, we might try:
SELECT city
FROM weather
WHERE temp_lo = max(temp_lo); -- WRONG
but this will not work since the aggregate max cannot be used in the WHERE clause. (This restriction exists because the WHERE clause
determines which rows will be included in the aggregate calculation; so obviously it has to be evaluated before aggregate functions are
computed.) However, as is often the case the query can be restated to accomplish the desired result, here by using a subquery:
SELECT city
FROM weather
WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
┌───────────────┐
│ city │
│ varchar │
├───────────────┤
│ San Francisco │
└───────────────┘
This is OK because the subquery is an independent computation that computes its own aggregate separately from what is happening in
the outer query.
Aggregates are also very useful in combination with GROUP BY clauses. For example, we can get the maximum low temperature observed
in each city with:
┌───────────────┬──────────────┐
│ city │ max(temp_lo) │
│ varchar │ int32 │
├───────────────┼──────────────┤
│ San Francisco │ 46 │
│ Hayward │ 37 │
└───────────────┴──────────────┘
Which gives us one output row per city. Each aggregate result is computed over the table rows matching that city. We can filter these
grouped rows using HAVING:
┌─────────┬──────────────┐
│ city │ max(temp_lo) │
│ varchar │ int32 │
├─────────┼──────────────┤
│ Hayward │ 37 │
└─────────┴──────────────┘
which gives us the same results for only the cities that have all temp_lo values below 40. Finally, if we only care about cities whose names
begin with ”S”, we can use the LIKE operator:
318
DuckDB Documentation
More information about the LIKE operator can be found in the pattern matching page.
It is important to understand the interaction between aggregates and SQL's WHERE and HAVING clauses. The fundamental difference
between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows
go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed. Thus, the WHERE
clause must not contain aggregate functions; it makes no sense to try to use an aggregate to determine which rows will be inputs to the
aggregates. On the other hand, the HAVING clause always contains aggregate functions.
In the previous example, we can apply the city name restriction in WHERE, since it needs no aggregate. This is more efficient than adding
the restriction to HAVING, because we avoid doing the grouping and aggregate calculations for all rows that fail the WHERE check.
Updates
You can update existing rows using the UPDATE command. Suppose you discover the temperature readings are all off by 2 degrees after
November 28. You can correct the data as follows:
UPDATE weather
SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
WHERE date > '1994-11-28';
SELECT *
FROM weather;
┌───────────────┬─────────┬─────────┬───────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
│ varchar │ int32 │ int32 │ float │ date │
├───────────────┼─────────┼─────────┼───────┼────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
│ San Francisco │ 41 │ 55 │ 0.0 │ 1994-11-29 │
│ Hayward │ 35 │ 52 │ │ 1994-11-29 │
└───────────────┴─────────┴─────────┴───────┴────────────┘
Deletions
Rows can be removed from a table using the DELETE command. Suppose you are no longer interested in the weather of Hayward. Then
you can do the following to delete those rows from the table:
SELECT *
FROM weather;
┌───────────────┬─────────┬─────────┬───────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
│ varchar │ int32 │ int32 │ float │ date │
├───────────────┼─────────┼─────────┼───────┼────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
│ San Francisco │ 41 │ 55 │ 0.0 │ 1994-11-29 │
└───────────────┴─────────┴─────────┴───────┴────────────┘
Without a qualification, DELETE will remove all rows from the given table, leaving it empty. The system will not request confirmation
before doing this!
319
DuckDB Documentation
Statements
Statements Overview
The ALTER TABLE statement changes the schema of an existing table in the catalog.
Examples
-- add a new column with name "k" to the table "integers", it will be filled with the default value NULL
ALTER TABLE integers ADD COLUMN k INTEGER;
-- add a new column with name "l" to the table integers, it will be filled with the default value 10
ALTER TABLE integers ADD COLUMN l INTEGER DEFAULT 10;
-- change the type of the column "i" to the type "VARCHAR" using a standard cast
ALTER TABLE integers ALTER i TYPE VARCHAR;
-- change the type of the column "i" to the type "VARCHAR", using the specified expression to convert
the data for each row
ALTER TABLE integers ALTER i SET DATA TYPE VARCHAR USING concat(i, '_', j);
-- rename a table
ALTER TABLE integers RENAME TO integers_old;
Syntax
ALTER TABLE changes the schema of an existing table. All the changes made by ALTER TABLE fully respect the transactional semantics,
i.e., they will not be visible to other transactions until committed, and can be fully reverted through a rollback.
RENAME TABLE
-- rename a table
ALTER TABLE integers RENAME TO integers_old;
The RENAME TO clause renames an entire table, changing its name in the schema. Note that any views that rely on the table are not
automatically updated.
320
DuckDB Documentation
RENAME COLUMN
The RENAME COLUMN clause renames a single column within a table. Any constraints that rely on this name (e.g., CHECK constraints) are
automatically updated. However, note that any views that rely on this column name are not automatically updated.
ADD COLUMN
-- add a new column with name "k" to the table "integers", it will be filled with the default value NULL
ALTER TABLE integers ADD COLUMN k INTEGER;
-- add a new column with name "l" to the table integers, it will be filled with the default value 10
ALTER TABLE integers ADD COLUMN l INTEGER DEFAULT 10;
The ADD COLUMN clause can be used to add a new column of a specified type to a table. The new column will be filled with the specified
default value, or NULL if none is specified.
DROP COLUMN
The DROP COLUMN clause can be used to remove a column from a table. Note that columns can only be removed if they do not have any
indexes that rely on them. This includes any indexes created as part of a PRIMARY KEY or UNIQUE constraint. Columns that are part of
multi‑column check constraints cannot be dropped either.
ALTER TYPE
-- change the type of the column "i" to the type "VARCHAR" using a standard cast
ALTER TABLE integers ALTER i TYPE VARCHAR;
-- change the type of the column "i" to the type "VARCHAR", using the specified expression to convert
the data for each row
ALTER TABLE integers ALTER i SET DATA TYPE VARCHAR USING concat(i, '_', j);
The SET DATA TYPE clause changes the type of a column in a table. Any data present in the column is converted according to the
provided expression in the USING clause, or, if the USING clause is absent, cast to the new data type. Note that columns can only have
their type changed if they do not have any indexes that rely on them and are not part of any CHECK constraints.
The SET/DROP DEFAULT clause modifies the DEFAULT value of an existing column. Note that this does not modify any existing data in
the column. Dropping the default is equivalent to setting the default value to NULL.
Note. Warning At the moment DuckDB will not allow you to alter a table if there are any dependencies. That means that if you
have an index on a column you will first need to drop the index, alter the table, and then recreate the index. Otherwise you will get a
”Dependency Error.”
321
DuckDB Documentation
Note. The ADD CONSTRAINT and DROP CONSTRAINT clauses are not yet supported in DuckDB.
The ALTER VIEW statement changes the schema of an existing view in the catalog.
Examples
-- rename a view
ALTER VIEW v1 RENAME TO v2;
ALTER VIEW changes the schema of an existing table. All the changes made by ALTER VIEW fully respect the transactional semantics,
i.e., they will not be visible to other transactions until committed, and can be fully reverted through a rollback. Note that other views that
rely on the table are not automatically updated.
ATTACH/DETACH Statement
The ATTACH statement adds a new database file to the catalog that can be read from and written to.
Examples
-- attach the database "file.db" with the alias inferred from the name ("file")
ATTACH 'file.db';
-- attach the database "file.db" with an explicit alias ("file_db")
ATTACH 'file.db' AS file_db;
-- attach the database "file.db" in read only mode
ATTACH 'file.db' (READ_ONLY);
-- attach a SQLite database for reading and writing (see the sqlite extension for more information)
ATTACH 'sqlite_file.db' AS sqlite_db (TYPE SQLITE);
-- attach the database "file.db" if inferred database alias "file_db" does not yet exist
ATTACH IF NOT EXISTS 'file.db';
-- attach the database "file.db" if explicit database alias "file_db" does not yet exist
ATTACH IF NOT EXISTS 'file.db' AS file_db;
-- create a table in the attached database with alias "file"
CREATE TABLE file.new_table (i INTEGER);
-- detach the database with alias "file"
DETACH file;
-- show a list of all attached databases
SHOW DATABASES;
-- change the default database that is used to the database "file"
USE file;
Attach
Attach Syntax ATTACH allows DuckDB to operate on multiple database files, and allows for transfer of data between different database
files.
Detach
The DETACH statement allows previously attached database files to be closed and detached, releasing any locks held on the database file.
It is not possible to detach from the default database: if you would like to do so, issue the USE statement to change the default database
to another one.
322
DuckDB Documentation
Note. Warning Closing the connection, e.g., invoking the close() function in Python, does not release the locks held on the
database files as the file handles are held by the main DuckDB instance (in Python's case, the duckdb module).
Detach Syntax
Name Qualification
The fully qualified name of catalog objects contains the catalog, the schema and the name of the object. For example:
Note that often the fully qualified name is not required. When a name is not fully qualified, the system looks for which entries to reference
using the catalog search path. The default catalog search path includes the system catalog, the temporary catalog and the initially attached
database together with the main schema.
Default Database and Schema When a table is created without any qualifications, the table is created in the default schema of the default
database. The default database is the database that is launched when the system is created ‑ and the default schema is main.
Changing the Default Database and Schema The default database and schema can be changed using the USE command.
Resolving Conflicts When providing only a single qualification, the system can interpret this as either a catalog or a schema, as long as
there are no conflicts. For example:
ATTACH 'new_db.db';
CREATE SCHEMA my_schema;
-- creates the table "new_db.main.tbl"
CREATE TABLE new_db.tbl (i INTEGER);
-- creates the table "default_db.my_schema.tbl"
CREATE TABLE my_schema.tbl (i INTEGER);
If we create a conflict (i.e., we have both a schema and a catalog with the same name) the system requests that a fully qualified path is used
instead:
323
DuckDB Documentation
Changing the Catalog Search Path The catalog search path can be adjusted by setting the search_path configuration option, which
uses a comma‑separated list of values that will be on the search path. The following example demonstrates searching in two databases:
Transactional Semantics
When running queries on multiple databases, the system opens separate transactions per database. The transactions are started lazily by
default ‑ when a given database is referenced for the first time in a query, a transaction for that database will be started. SET immediate_
transaction_mode = true can be toggled to change this behavior to eagerly start transactions in all attached databases instead.
While multiple transactions can be active at a time ‑ the system only supports writing to a single attached database in a single transaction.
If you try to write to multiple attached databases in a single transaction the following error will be thrown:
Attempting to write to database "db2" in a transaction that has already modified database "db1" -
a single transaction can only write to a single attached database.
The reason for this restriction is that the system does not maintain atomicity for transactions across attached databases. Transactions are
only atomic within each database file. By restricting the global transaction to write to only a single database file the atomicity guarantees
are maintained.
CALL Statement
The CALL statement invokes the given table function and returns the results.
Examples
Syntax
CHECKPOINT Statement
The CHECKPOINT statement synchronizes data in the write‑ahead log (WAL) to the database data file. For in‑memory databases this
statement will succeed with no effect.
Examples
324
DuckDB Documentation
CHECKPOINT file_db;
-- Abort any in-progress transactions to synchronize the data
FORCE CHECKPOINT;
Syntax
Checkpoint operations happen automatically based on the WAL size (see Configuration). This statement is for manual checkpoint ac‑
tions.
Behavior
The default CHECKPOINT command will fail if there are any running transactions. Including FORCE will abort any transactions and execute
the checkpoint operation.
Also see the related PRAGMA option for further behavior modification.
Reclaiming Space When performing a checkpoint (automatic or otherwise), the space occupied by deleted rows is partially reclaimed.
Note that this does not remove all deleted rows, but rather merges row groups that have a significant amount of deletes together. In the
current implementation this requires ~25% of rows to be deleted in adjacent row groups.
When running in in‑memory mode, checkpointing has no effect, hence it does not reclaim space after deletes in in‑memory databases.
Note. Warning The VACUUM statement does not trigger vacuuming deletes and hence does not reclaim space.
COMMENT ON Statement
The COMMENT ON statement allows adding metadata to catalog entries (tables, columns, etc.). It follows the PostgreSQL syntax.
Examples
Reading Comments
Comments can be read by querying the comment column of the respective metadata functions:
325
DuckDB Documentation
Limitations
Syntax
COPY Statement
Examples
-- read a CSV file into the lineitem table, using auto-detected CSV options
COPY lineitem FROM 'lineitem.csv';
-- read a CSV file into the lineitem table, using manually specified CSV options
COPY lineitem FROM 'lineitem.csv' (DELIMITER '|');
-- read a Parquet file into the lineitem table
COPY lineitem FROM 'lineitem.pq' (FORMAT PARQUET);
-- read a JSON file into the lineitem table, using auto-detected options
COPY lineitem FROM 'lineitem.json' (FORMAT JSON, AUTO_DETECT true);
-- read a CSV file into the lineitem table, using double quotes
COPY lineitem FROM "lineitem.csv";
-- read a CSV file into the lineitem table, omitting quotes
COPY lineitem FROM lineitem.csv;
Overview
COPY moves data between DuckDB and external files. COPY ... FROM imports data into DuckDB from an external file. COPY ... TO
writes data from DuckDB to an external file. The COPY command can be used for CSV, PARQUET and JSON files.
COPY ... FROM imports data from an external file into an existing table. The data is appended to whatever data is in the table already.
The amount of columns inside the file must match the amount of columns in the table table_name, and the contents of the columns
must be convertible to the column types of the table. In case this is not possible, an error will be thrown.
If a list of columns is specified, COPY will only copy the data in the specified columns from the file. If there are any columns in the table that
are not in the column list, COPY ... FROM will insert the default values for those columns
326
DuckDB Documentation
-- Copy the contents of a comma-separated file 'test.csv' without a header into the table 'test'
COPY test FROM 'test.csv';
-- Copy the contents of a comma-separated file with a header into the 'category' table
COPY category FROM 'categories.csv' (HEADER);
-- Copy the contents of 'lineitem.tbl' into the 'lineitem' table, where the contents are delimited by a
pipe character ('|')
COPY lineitem FROM 'lineitem.tbl' (DELIMITER '|');
-- Copy the contents of 'lineitem.tbl' into the 'lineitem' table, where the delimiter, quote character,
and presence of a header are automatically detected
COPY lineitem FROM 'lineitem.tbl' (AUTO_DETECT true);
-- Read the contents of a comma-separated file 'names.csv' into the 'name' column of the 'category'
table. Any other columns of this table are filled with their default value.
COPY category(name) FROM 'names.csv';
-- Read the contents of a Parquet file 'lineitem.parquet' into the lineitem table
COPY lineitem FROM 'lineitem.parquet' (FORMAT PARQUET);
-- Read the contents of a newline-delimited JSON file 'lineitem.ndjson' into the lineitem table
COPY lineitem FROM 'lineitem.ndjson' (FORMAT JSON);
-- Read the contents of a JSON file 'lineitem.json' into the lineitem table
COPY lineitem FROM 'lineitem.json' (FORMAT JSON, ARRAY true);
Syntax
COPY ... TO
COPY ... TO exports data from DuckDB to an external CSV or Parquet file. It has mostly the same set of options as COPY ... FROM,
however, in the case of COPY ... TO the options specify how the file should be written to disk. Any file created by COPY ... TO can
be copied back into the database by using COPY ... FROM with a similar set of options.
The COPY ... TO function can be called specifying either a table name, or a query. When a table name is specified, the contents of the
entire table will be written into the resulting file. When a query is specified, the query is executed and the result of the query is written to
the resulting file.
-- Copy the contents of the 'lineitem' table to a CSV file with a header
COPY lineitem TO 'lineitem.csv';
-- Copy the contents of the 'lineitem' table to the file 'lineitem.tbl',
-- where the columns are delimited by a pipe character ('|'), including a header line.
COPY lineitem TO 'lineitem.tbl' (DELIMITER '|');
-- Use tab separators to create a TSV file without a header
COPY lineitem TO 'lineitem.tsv' (DELIMITER '\t', HEADER false);
-- Copy the l_orderkey column of the 'lineitem' table to the file 'orderkey.tbl'
COPY lineitem(l_orderkey) TO 'orderkey.tbl' (DELIMITER '|');
-- Copy the result of a query to the file 'query.csv', including a header with column names
COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.csv' (DELIMITER ',');
-- Copy the result of a query to the Parquet file 'query.parquet'
COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.parquet' (FORMAT PARQUET);
-- Copy the result of a query to the newline-delimited JSON file 'query.ndjson'
COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.ndjson' (FORMAT JSON);
-- Copy the result of a query to the JSON file 'query.json'
COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.json' (FORMAT JSON, ARRAY true);
COPY ... TO Options Zero or more copy options may be provided as a part of the copy operation. The WITH specifier is optional, but
if any options are specified, the parentheses are required. Parameter values can be passed in with or without wrapping in single quotes.
Any option that is a Boolean can be enabled or disabled in multiple ways. You can write true, ON, or 1 to enable the option, and false,
OFF, or 0 to disable it. The BOOLEAN value can also be omitted, e.g., by only passing (HEADER), in which case true is assumed.
The below options are applicable to all formats written with COPY.
327
DuckDB Documentation
Syntax
The COPY FROM DATABASE ... TO statement copies the entire content from one attached database to another attached database.
This includes the schema, including constraints, indexes, sequences, macros, and the data itself.
┌───────┐
│ z │
│ int32 │
├───────┤
│ 87 │
└───────┘
To only copy the schema of db1 to db2 but omit copying the data, add SCHEMA to the statement:
328
DuckDB Documentation
Syntax
Format‑Specific Options
CSV Options The below options are applicable when writing CSV files.
compression The compression type for the file. By default this will be VARCHAR auto
detected automatically from the file extension (e.g.,
file.csv.gz will use gzip, file.csv will use none).
Options are none, gzip, zstd.
force_quote The list of columns to always add quotes to, even if not VARCHAR[] []
required.
dateformat Specifies the date format to use when writing dates. See VARCHAR (empty)
Date Format
delim or sep The character that is written to separate columns within VARCHAR ,
each row.
escape The character that should appear before a character that VARCHAR "
matches the quote value.
header Whether or not to write a header for the CSV file. BOOL true
nullstr The string that is written to represent a NULL value. VARCHAR (empty)
quote The quoting character to be used when a data value is VARCHAR "
quoted.
timestampformat Specifies the date format to use when writing timestamps. VARCHAR (empty)
See Date Format
Parquet Options The below options are applicable when writing Parquet files.
329
Du