-
Notifications
You must be signed in to change notification settings - Fork 70
Description
SQL clients often have a user config knob that can be used to specify how NULL values are presented. For formats with a well-defined spec like JSON or SUP it probably makes sense to always stick with the standard, but for looser formats like CSV, TSV, text, and table users would likely appreciate a similar knob with super.
Details
Repro is with super commit 9fcbd22.
We'll run our examples with the following sample data.
$ super -version
Version: 9fcbd222b
$ echo '{a:1} {b:2}' | super -f parquet -o data.parquet -c "fuse" -
$ super -s data.parquet
{a:1,b:null::int64}
{a:null::int64,b:2}
For reasons I think have to do with the roots of super, with text and table formats, right now null values are rendered as -.
$ super -f text data.parquet
1 -
- 2
$ super -f table data.parquet
a b
1 -
- 2
As touched on in #5868, with CSV and TSV, they're rendered as empty values.
$ super -f csv data.parquet
a,b
1,
,2
$ super -f tsv data.parquet
a b
1
2
By default DuckDB renders them as NULL, but offers the -nullvalue flag to vary this, such as of a user wanted to match what SuperDB currently does.
$ duckdb --version
v1.3.0 71c5c07cdd
$ duckdb --csv -c "SELECT * FROM 'data.parquet';"
a,b
1,NULL
NULL,2
$ duckdb --csv --nullvalue '-' -c "SELECT * FROM 'data.parquet';"
a,b
1,-
-,2
As also touched on in #5868, ClickHouse renders with \N by default, but they have several ways to vary this, including per-format knobs, e.g.,
$ clickhouse --version
ClickHouse local version 25.5.2.47 (official build).
$ clickhouse --format=csv --query "SELECT * FROM 'data.parquet';"
1,\N
\N,2
$ clickhouse --format=csv --format_csv_null_representation='-' --query "SELECT * FROM 'data.parquet';"
1,-
-,2
Postgres psql renders by default as an empty string, but a pset knob can vary this. Using the pg_parquet extension to load data from our Parquet data file:
pg_parquet=# CREATE TABLE data (
a BIGINT,
b BIGINT
);
CREATE TABLE
pg_parquet=# COPY data FROM '/Users/phil/work/super-5961/data.parquet';
COPY 2
pg_parquet=# SELECT * FROM data;
a | b
---+---
1 |
| 2
(2 rows)
pg_parquet=# \pset null 'NULL'
Null display is "NULL".
pg_parquet=# SELECT * FROM data;
a | b
------+------
1 | NULL
NULL | 2
(2 rows)
Context
I happened to confront this topic because I was working with a sqllogictest that outputs NULL values. Up to that point I was having pretty good luck using super -f text to get the simple outputs that could be diff'ed against the expected outputs in sqllogictest files. super's static use of - creates a challenge for me here personally, but as evidenced by the presence of the knobs in other SQL clients I expect other users may see it as a problem as well.